Skip to Main content Skip to Navigation
Conference papers

Étiquetage thématique automatisé de corpus par représentation sémantique

Abstract : In scientific text corpus, some articles from different research communities are not tagged by the same keywords even if they share the same topic. This causes issues in information retrieval systems using limited number of tag variations and thus, lower chances of interdisciplinary exploration. Our approach automatically assigns a topic tag to articles by learning a classifier for each topic based on the semantics representation of the title and the abstract of already tagged articles. The approach requires much less computation power than using topic modeling on millions of documents. In our proposed model, we use topic sysnomyns to retrieve more semantically similar articles and merge them to the articles obtained by the topic classifier. The experiments show higher recall against two variations of the model, one only uses the synonyms set, and another one only uses the semantic representation of the text.
Complete list of metadatas

Cited literature [11 references]  Display  Hide  Download
Contributor : Fabrice Muhlenbach <>
Submitted on : Friday, December 8, 2017 - 4:08:05 PM
Last modification on : Thursday, April 30, 2020 - 10:12:02 AM


Files produced by the author(s)


  • HAL Id : hal-01659639, version 1


Lucie Martinet, Hussein Al-Natsheh, Fabien Rico, Fabrice Muhlenbach, Djamel Zighed. Étiquetage thématique automatisé de corpus par représentation sémantique. EGC 2018 - 18ème Conférence Internationale sur l'Extraction et la Gestion de Connaissances, Jan 2018, Paris-Nord, France. pp.1-6. ⟨hal-01659639⟩



Record views


Files downloads