24

I'm in the process of assessing the capabilities of the NLTK in processing Arabic text in a research to analyze and extract sentiments.

Question is as follows:

  1. Is the NTLK capable of handling and allows the analysis of Arabic text?
  2. Is python capable of manipulating\tokenizing Arabic text?
  3. Will I be able to parse and store Arabic text using Python?

If python and NTLK aren't the tools for this job, what tools would you recommend (if existent)?

Thank you.


EDIT

Based on research:

  1. NTLK is only capable of stemming Arabic text: Link
  2. Python is capable of handling Arabic text since it supports UTF-8 unicode: Link
  3. Parsing and Lemmatization of Arabic text can be done using: SNLPG (The Stanford Natural Language Processing Group) Statistical Parser: Link
Sнаđошƒаӽ
  • 16,753
  • 12
  • 73
  • 90
Bassem
  • 3,135
  • 2
  • 25
  • 41

1 Answers1

7

A simple google search lead to these links:

Arabic Natural Language Processing

Using Python with the Quranic Arabic Corpus

HOWTO: Working with Python, Unicode, and Arabic

Are any of these useful?

Boris Gorelik
  • 29,945
  • 39
  • 128
  • 170
  • 4
    Thank you for your contribution. However my question requires an answer based on experience in the topic above. I searched a lot and found many lexical parsers that are able to parse Arabic sentences based on the Penn Arabic Treebank but nothing regarding text analysis and sentiment extraction. I will keep this question unanswered for a while, maybe someone else can contribute some of his knowledge. If not I will consider yours as the correct answer. – Bassem Sep 12 '11 at 17:09
  • You are welcome. You might consider editing your question to contain all the info you just provided in this comment. – Boris Gorelik Sep 12 '11 at 20:24
  • @Bassem, I know this is an old post, however I'd like to know whether you have found a solution for extracting sentiments? – Frumples Jun 16 '16 at 19:10
  • 3
    @Frumples I haven't found an off the shelf engine, we ended up building a proprietary solution which was never released. – Bassem Jun 16 '16 at 19:11