jeudi 22 novembre 2018

Paper Review 7 : A Grammar-Based Semantic Similarity Algorithm for Natural Language Sentences

In this post, the research Article "A Grammar-Based Semantic Similarity Algorithm for Natural Language Sentences" is summarized.
Ming Che Lee: Departement of computer and communication engineering, Ming chuan University, Taiwan.
Jia Wei Chang: Department of Engineering Science, National Cheng Kung University, Taiwan.
Tung Cheng Hsieh : Departement of visual communication Design, Hsuan Chuang University, Taiwan
Published 10 April 2014

Summary:


Since a part of our project is about finding similarity between terms (sentence) and documents (text; job descriptions), we chose this published paper that disscuss almost all the technics and approaches known , from the day computer scientist start the natural language processing to the day of writing this article.

This paper, first,  presents the different existing models such as Latent Semantic analisys/indexing (LSA/LSI), Hyper data of language (HAL), Probabilistic latent semantic analysis/indexing (PLSA/PLSI) and the Vector space model (VSM), with a brief comparison. The conclusion was that those approaches calculate the similarity based on the number of shared terms in articles, instead of overlook the syntactic structure of sentences and some disadvantages may arise when applying them to calculate the similarity between short texts/sentences directly. Then, it presents a new approach which is a grammar and semantic corpus based similarity algorithm for natural language sentences. 

This new approache addresses the limitations of these existing approaches by using grammatical rules and the WordNet ontology.

Traditional information retrival  technologies may not always determine the perfect matching without obvious relation or concept overlap between two natural language sentences. Some approaches deal with this problem via determining the order of words and the evaluation of semantic vectors; however, they were hard to be applied to compare the sentences with complex syntax as well as long sentences and sentences with arbitrary patterns and grammars. 

The proposed approach takes advantage of corpus-based ontology and grammatical rules to overcome this problem by a set of grammar matrices that is built for representing the relationships (corrolations) between pairs of sentences  instead of considering common words or word order. The size of the set is limited to the maximum number of selected grammar links. The latent semantic of words is calculated via a WordNet similarity measure, semantic trees, that  increases the chances of finding a semantic relation between any nouns and verbs. 

finally,the results demonstrate that the proposed method performed very well both in the sentences similarity and the task of paraphrase recognition.

Aucun commentaire:

Enregistrer un commentaire

Presentation: Dorra EL MEKKI

Link :   presentation_dorraElMekki