Paper Review 10 : Sentence Similarity Based on Semantic Vector Model

In this post, the Article "Sentence Similarity Based on Semantic Vector Model " is summarized.

Written by :

Zhao Jingling, School of Computer, Beijing University of Posts and Telecommunications Beijing, China
Zhang Huiyun National Engineering Laboratory for Mobile Network Security, School of Computer, Beijing University of Posts and Telecommunications Beijing, China
Cui Baojiang National Engineering Laboratory for Mobile Network Security, School of Computer Beijing University of Posts and Telecommunications Beijing, China

Published in : 2014 Ninth International Conference on P2P, Parallel, Grid, Cloud and Internet Computing

Link to paper : Sentence Similarity Based on Semantic Vector Model

Summary:

According to the authors, a sentence is “considered to be a sequence of words each of which carries useful information … Words sequence contains semantic features, word order information and structural characteristics of sentence, which sentence similarity relies on.”

Previous methods, in order to compute sentence similarity, are used for long text documents. However, this paper represents a new approach applicable for very short texts of sentence length based on semantic information, structure information and word order information.

In fact, this proposed approach is a combination of:

Semantic similarity between sentences based on:

The word semantic similarity: It can be obtained using dictionary/thesaurus-based methods or corpus-based methods.How-net, which defines a word in a complicated multidimensional knowledge description language, is the lexical knowledgebase employed in this research. In how-net, a word is a group of small units (sememes) that describe the word meaning.
The structure of sentences

Each sentence is represented by a semantic vector calculated using the joint word set of the sentences word sets in which, element values are in range [0,1].

Word order similarity between sentences : That provides information about relationship between words, since word order plays a role in conveying the meaning of sentences.

So, the overall sentence similarity is defined as:

Where epsilon is a factor for weighting the significance between semantic information and word order information with 0.85 as a value which is empirically found.

Finally,this approach shows the best results comparing to other methods such as method based on semantic and words order proposed by Li Yuhua, method based on semantic, method based on words order. That's why we aim to use it in our project.

Natural Language Processing - Blog

samedi 8 décembre 2018