dimanche 11 novembre 2018

Paper Review 6 :PhraseCTM: Correlated Topic Modeling on Phrases within Markov Random Fields

In this post, the paper "PhraseCTM: Correlated Topic Modeling on Phrases within Markov Random Fields"is summarized.


Weijing Huang, Tengjiao Wang, Wei Chen, Siyuan Jiang, Kam-Fai Wong, PhraseCTM: Correlated Topic Modeling on Phrases within Markov Random Fields, Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Short Papers), pages 521–526 Melbourne, Australia, July 15 - 20, 2018.Association for Computational Linguistics.


Summary:

PhraseCTM is a novel method proposed in order to find out the correlated topics at phrase level.

This work is done in two stages:
  •  The training stage to extract the model using Markov Random Fields to link the phrases and the words when they are semantically coherent.
  • Generating the correlation of topics from PhraseCTM and evaluating the model using a quantitative experiment and human study.


CTM is a solution for what?

It is easier to understand the topic using Topic modeling on phrases “grounding conductor, grounding wire, aluminum wiring” than to understand it using Topic modeling on words“ground, wire, use, power, cable, wires” which doesn’t include the context. However the first method is complex when the size is important.
So, CTM applies the correlation structure in order to figure out the correlated relationship between topics and group the similar topics together.

When CTM is performing well?
           
Phrases are much less than words in each document and CTM needs more contextual information to build a performed model. So we don't use it with short documents.

How CTM works?

1. Training PhraseCTM:
 Transform a document into words and phrases semantically coherent.
 Link between phrases and component words:
  •    Calculate the NPMI metric.
  •    Define a threshold to take a decision.
  •    Double count the phrases as two parts, one as the phrase itself, the other as the component words.
  •    Model the generation of words and phrases simultaneously by linking the phrases and component words within Markov Random.

2. Generating the correlation of topics :
Evaluate the method on five datasets by a quantitative experiment and a human study.

Conclusion:
CTM is a solution that helps finding the topic of the corpus and it has demonstrated a high-quality phrase-level topics.


  

Aucun commentaire:

Enregistrer un commentaire

Presentation: Dorra EL MEKKI

Link :   presentation_dorraElMekki