Natural Language Processing - Blog: octobre 2018

jeudi 25 octobre 2018

Paper review 3 : Efﬁcient Estimation of Word Representations in Vector Space

In order to find an efficient algorithm for our term project which has to measure the syntactic and semantic word similarities. I have chosen this paper which introduces us two of the best model architectures for computing continuous vector representations of words from large data sets and compares them to other best-known techniques based on different types of neural networks which we shall analyze during our work on the term project.

Word2vec is a particularly computationally-efficient predictive model for learning word embeddings from raw text. It comes in two flavors, the Continuous Bag-of-Words model (CBOW) and the Skip-Gram model.

Algorithmically, these models are similar, except that CBOW predicts target words from source context words, while the skip-gram does the inverse and predicts source context-words from the target words.

According to Mikolov; Skip-gram model works well with small amount of the training data and represents well even rare words, but CBOW model is several times faster to train than the skip-gram and slightly better accuracy for the frequent words

In order to to maximize the accuracy, while minimizing the computational complexity of a model which is the number of parameters that need to be accessed to fully train the model and was defined to compare different model architectures, the authors of this paper developed new model architectures that preserve the linear regularities among words. They designed a new comprehensive test set for measuring both syntactic and semantic regularities, and show that many such regularities can be learned with high accuracy.

Moreover, the paper discuss how training time and accuracy depends on the dimensionality of the word vectors and on the amount of the training data.

Conference Paper : Efficient Estimation of Word Representations in Vector Space

Conference: Proceedings of the International Conference on Learning Representations (ICLR 2013)

By : Tomas Mikolov, Kai Chen, Greg Corrado, Jeffrey Dean

( Submitted on 16 Jan 2013 (v1), last revised 7 Sep 2013 )

jeudi 18 octobre 2018

Paper Review 2 :A Document Descriptor using Covariance of Word Vectors

In this post, the paper “A Document Descriptor using Covariance of Word Vectors” is summarized.

Link to paper: http://www.aclweb.org/anthology/P18-2084

Marwan Torki , A Document Descriptor using Covariance of Word Vectors ,56th Annual Meeting of the Association for Computational Linguistics (Short Papers), pages 527–532 Melbourne, Australia, July 15 - 20, 2018.

Summary:

This paper presents a novel document descriptor based on the covariance matrix of text vectors.

To begin, Document-Covariance descriptor (DoCov) represents every text as the covariance of the word embedding (see https://en.wikipedia.org/wiki/Word_embedding) of its words using a fixed-length representation of the paragraph. This representation catches the interrelationship between the dimensions of the word embedding via the covariance matrix elements.

Moreover, DoCov has several advantages compared to earlier methods. We mean by earlier methods:

First, vector space models via bag-of words (BOW) representation. In this model, a text is represented as the bag of its words, disregarding word order but keeping multiplicity.

Second, Latent semantic indexing (LSI) that is a mathematical method developed to improve the accuracy of information retrieval.

Third, deep learning methods that learn word vector representation.

Here some advantages of DoCov:

First, it has a fixed-length representation of the paragraph, which makes it easy to use in for both supervised and unsupervised applications.

Furthermore, according to experimental evaluations, the use of the covariance as a spatial descriptor for multivariate data fits different tasks with competitive performance against old methods.

Finally, compared to earlier methods, using the computation of the covariance descriptor is known to be fast and highly parallelizable because there are no inference steps involved.

To conclude, not only DoCov is consistently better than state-of-the-art methods on the text classification benchmark, but also concatenating it with baselines such as mean vector and BOW will show better results.

jeudi 11 octobre 2018

Review on " Lexical Interference over Multi-Word Predicates: A Distributional Approach "

"Lexical Interference over Multi-Word Predicates: A Distributional Approach" is the title of the article we are discussing in this review. In fact, this document describes a new approach, of using latent variables in modeling the predicate's lexical components (LCs) in order to consider the most relevant LCs while making prediction. To begin with, a brief definition of "Multi-Word Expressions (MWEs)" which are complex lexical units and "Multi-Word Predicates (MWP)" which are informally defined as multiple words that constitutes a single predicate. MWPs form the most important sub-class of MWEs. The proposed approach to the task is complementary to most others, in which they use distributional similarity as a major component within their system. In fact, the previous works focused on improving the quality of distributional representations themselves. However, this one focuses on the integration of this type of representation to improve the identification of inference relations between MWPs. Since MWPs demonstrate varying levels of compositionality, a uniform treatment of MWPs either as fixed expressions or through head words is lacking. Instead, our approach integrates multiple lexical units contained in the predicate. The approach considers both multi-word LCs. As the method is aimed to discover the most relevant LCs, they do not attempt to analyze the MWPs in advance, but rather take an inclusive set of allowable LCs for a given predicate, allowing the model to estimate the relative weights of the LCs. Finally, we assume that adopting an approach that combine multiple analyses would perform better than standard single-analysis methods in a large range of applications.

Lexical Interference over Multi-Word Predicates: A Distributional Approach

Natural Language Processing - Blog