jeudi 18 octobre 2018

Paper Review 2 :A Document Descriptor using Covariance of Word Vectors





Marwan Torki , A Document Descriptor using Covariance of Word Vectors ,56th Annual Meeting of the Association for Computational Linguistics (Short Papers), pages 527–532 Melbourne, Australia, July 15 - 20, 2018.


Summary:


     This paper presents a novel document descriptor based on the covariance matrix of text vectors.

     To begin, Document-Covariance descriptor (DoCov) represents every text as the covariance of the word embedding (see https://en.wikipedia.org/wiki/Word_embedding) of its words using a fixed-length representation of the paragraph. This representation catches the interrelationship between the dimensions of the word embedding via the covariance matrix elements.

     Moreover, DoCov has several advantages compared to earlier methods. We mean by earlier methods:
First, vector space models via bag-of words (BOW) representation. In this model, a text is represented as the bag of its words, disregarding word order but keeping multiplicity.
Second, Latent semantic indexing (LSI) that is a mathematical method developed to improve the accuracy of information retrieval.
Third, deep learning methods that learn word vector representation.
Here some advantages of DoCov:
First, it has a fixed-length representation of the paragraph, which makes it easy to use in for both supervised and unsupervised applications.
Furthermore, according to experimental evaluations, the use of the covariance as a spatial descriptor for multivariate data fits different tasks with competitive performance against old methods. 
Finally, compared to earlier methods, using the computation of the covariance descriptor is known to be fast and highly parallelizable because there are no inference steps involved. 

     To conclude, not only DoCov is consistently better than state-of-the-art methods on the text classification benchmark, but also concatenating it with baselines such as mean vector and BOW will show better results.

Aucun commentaire:

Enregistrer un commentaire

Presentation: Dorra EL MEKKI

Link :   presentation_dorraElMekki