Paper Review 8 :Text coherence new method using word2vec sentence vectors and most likely n-grams
In this post, the paper "Text coherence new method using word2vec sentence vectors and most likely n-grams" is summarized.
Link to paper: https://ieeexplore.ieee.org/document/8311598
Mohamad Abdolahi
Kharazmi, Morteza Zahedi
Kharazmi,Text coherence new method using word2vec sentence vectors and most likely n-grams, 3rd Iranian Conference on Intelligent Systems and Signal Processing (ICSPIS), Iran, 20-21 Dec. 2017. IEEE Xplore.
Summary:
This paper investigates the automatic evaluation of text coherence,
which is a fundamental post processing in many NLP tasks such as machine
translation and question answering .
The approach proposed in this paper combines the word2vec vectors and
most likely n-grams to assess the coherence and topic integrity of document. It uses new technologies like
deep learning, transforming words
into numerical vectors and using statistical methods to assess the
coherence of texts.
The authors evaluate the
text coherence with statistical
methods with both local and global coherence which captures text organization at the level of
sentence to sentence and paragraph to
paragraph transitions. Without caring about the meaning of words or the
handcrafted rules. So, the approach does not depend on the language and its semantic concepts and it has the ability to
apply on any language.
Instead of the other methods, the
preprocessing here is different.
First of all, each document is
transformed to separate sentences. Secondly, sentence’s matrix is created using
word2vec word vectors. Finally, it is normalized using n-grams model.
Stop words, stemming and POS
tagging are not performed.
However, some basic preprocessing
are used such as removing spacing between words and
punctuation marks, removing extra
spaces characters between words and unification of accented characters.
Not only the preprocessing is
different but also in previous approaches, local coherence is tested at the
level of several consecutive sentences. That is why, sections with an important distance may not have any
relation. However, in the proposed approach, local coherence is raised at the
level of a paragraph and a coherent paragraph is assumed a local coherent
section.
To conclude, this model is very
sufficient. It is robust among language and domains and it doesn’t suffer from computational complexity.
Aucun commentaire:
Enregistrer un commentaire