International Journal of
Computer Engineering Research

  • Abbreviation: Int. J. Comput. Eng. Res.
  • Language: English
  • ISSN: 2141-6494
  • DOI: 10.5897/IJCER
  • Start Year: 2010
  • Published Articles: 33

Full Length Research Paper

Hidden markov model based Arabic morphological analyzer

A. F. Alajmi*, E. M. Saad and M. H. Awadalla
Communication and Electronics Department, Faculty of Engineering, Helwan University, Egypt.
Email: [email protected]. uk

  •  Accepted: 24 February 2011
  •  Published: 30 March 2011

Abstract

 

Natural language processing tasks includes summarization, machine translation, question understanding, part of speech tagging, etc. In order to achieve those tasks, a proper language representation must be defined. Roots and stems are considered as representations for some of those systems. A word needs to be processed to extract its root or stem. This paper presents a new technique that extracts word weights, by stripping of prefixes and suffixes from a given word. This technique is based on Hidden Markov Model (HMM). A path from a start state to the end state represents a word, each state constitute letters of a word. States are prefixes, weights, and suffixes. The best selected path should have the highest likelihood of a word. The approach results in a promising 95% performance.

 

Key words: Natural language processing, morphology, hidden markov model, stem.