jeudi 29 novembre 2018

Paper Review 8 :Text coherence new method using word2vec sentence vectors and most likely n-grams



In this post, the paper "Text coherence new method using word2vec sentence vectors and most likely n-grams" is summarized.

Link to paper: https://ieeexplore.ieee.org/document/8311598  


Mohamad Abdolahi Kharazmi, Morteza Zahedi Kharazmi,Text coherence new method using word2vec sentence vectors and most likely n-grams, 3rd Iranian Conference on Intelligent Systems and Signal Processing (ICSPIS), Iran, 20-21 Dec. 2017IEEE Xplore.

Summary:

This paper investigates the automatic evaluation of text coherence, which is a fundamental post processing in many NLP tasks such as machine translation and question answering .

The approach proposed in this paper combines the word2vec vectors and most likely  n-grams to assess the coherence and topic integrity of document. It uses new technologies like deep learning, transforming words into numerical vectors and using statistical methods to assess the coherence of texts.

The authors evaluate the text coherence with statistical methods with both local and global coherence which captures text organization at the level of sentence to sentence and paragraph to paragraph transitions. Without caring about the meaning of words or the handcrafted rules. So, the approach does not depend on the language and its semantic concepts and it has the ability to apply on any language.

Instead of the other methods, the preprocessing here is different.

First of all, each document is transformed to separate sentences. Secondly, sentence’s matrix is created using word2vec word vectors. Finally, it is normalized using n-grams model.
Stop words, stemming and POS tagging are not performed.
However, some basic preprocessing are used such as removing spacing between words and
punctuation marks, removing extra spaces characters between words and unification of accented characters.
Not only the preprocessing is different but also in previous approaches, local coherence is tested at the level of several consecutive sentences. That is why, sections  with an important distance may not have any relation. However, in the proposed approach, local coherence is raised at the level of a paragraph and a coherent paragraph is assumed a local coherent section.

To conclude, this model is very sufficient. It is robust among language and domains  and it doesn’t suffer from computational complexity.

Aucun commentaire:

Enregistrer un commentaire

Presentation: Dorra EL MEKKI

Link :   presentation_dorraElMekki