Skip to Main content Skip to Navigation
Conference papers

Amharic Document Representation for Adhoc Retrieval

Abstract : Amharic is the official language of the government of Ethiopia currently having an estimated population of over 110 million. Like other Semitic languages, Amharic is characterized by complex morphology where thousands of words are generated from a single root form through inflection and derivation. This has made the development of tools for Amharic natural language processing a non-trivial task. Amharic adhoc retrieval faces difficulties due to the complex morphological structure of the language. In this paper, the impact of morphological features on the representation of Amharic documents and queries for adhoc retrieval is investigated. We analyze the effects of stem-based and root-based approaches on Amharic adhoc retrieval effectiveness. Various experiments are conducted on TREC-like Amharic information retrieval test collection using standard evaluation framework and measures. The findings show that a root-based approach outperforms the conventional stem-based approachthat prevails in many other languages.
Complete list of metadatas

Cited literature [19 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-02960435
Contributor : Josiane Mothe <>
Submitted on : Wednesday, October 7, 2020 - 4:49:33 PM
Last modification on : Wednesday, October 21, 2020 - 9:13:06 AM

File

TilahunKDIR2020-september28.pd...
Files produced by the author(s)

Identifiers

  • HAL Id : hal-02960435, version 1

Citation

Tilahun Yeshambel, Josiane Mothe, Yaregal Assabie. Amharic Document Representation for Adhoc Retrieval. KDIR 2020, Nov 2020, Online conference, Hungary. ⟨hal-02960435⟩

Share

Metrics

Record views

42

Files downloads

45