Paper review 3 : Efﬁcient Estimation of Word Representations in Vector Space

In order to find an efficient algorithm for our term project which has to measure the syntactic and semantic word similarities. I have chosen this paper which introduces us two of the best model architectures for computing continuous vector representations of words from large data sets and compares them to other best-known techniques based on different types of neural networks which we shall analyze during our work on the term project.

Word2vec is a particularly computationally-efficient predictive model for learning word embeddings from raw text. It comes in two flavors, the Continuous Bag-of-Words model (CBOW) and the Skip-Gram model.

Algorithmically, these models are similar, except that CBOW predicts target words from source context words, while the skip-gram does the inverse and predicts source context-words from the target words.

According to Mikolov; Skip-gram model works well with small amount of the training data and represents well even rare words, but CBOW model is several times faster to train than the skip-gram and slightly better accuracy for the frequent words

In order to to maximize the accuracy, while minimizing the computational complexity of a model which is the number of parameters that need to be accessed to fully train the model and was defined to compare different model architectures, the authors of this paper developed new model architectures that preserve the linear regularities among words. They designed a new comprehensive test set for measuring both syntactic and semantic regularities, and show that many such regularities can be learned with high accuracy.

Moreover, the paper discuss how training time and accuracy depends on the dimensionality of the word vectors and on the amount of the training data.

Conference Paper : Efficient Estimation of Word Representations in Vector Space

Conference: Proceedings of the International Conference on Learning Representations (ICLR 2013)

By : Tomas Mikolov, Kai Chen, Greg Corrado, Jeffrey Dean

( Submitted on 16 Jan 2013 (v1), last revised 7 Sep 2013 )

Natural Language Processing - Blog

jeudi 25 octobre 2018

Paper review 3 : Efﬁcient Estimation of Word Representations in Vector Space

Aucun commentaire:

Enregistrer un commentaire

Presentation: Dorra EL MEKKI

Signaler un abus