4

I am trying to get the synonyms for arabic words in a sentence

If the word is in English it works perfectly, and the results are displayed in Arabic language, I was wondering if its possible to get the synonym of an Arabic word right away without writing it in english first.

I tried that but it didn't work & I would prefer without tashkeel انتظار instead of اِنْتِظار

from nltk.corpus import wordnet as omw
jan = omw.synsets('انتظار ')[0]
print(jan)
print(jan.lemma_names(lang='arb'))
Assem
  • 11,574
  • 5
  • 59
  • 97
IS92
  • 690
  • 1
  • 13
  • 28
  • It's an old answer but tell me if it doesn't work: http://stackoverflow.com/questions/16096559/arabic-wordnet-with-not-formatted-words – alvas Jan 05 '16 at 20:48
  • no I tried it but it didn't work at all – IS92 Jan 05 '16 at 23:18
  • The nltk now provides the Open Multilingual Wordnet, which includes Arabic. See https://stackoverflow.com/questions/45156965/nltk-omw-wordnet-with-arabic-language. – alexis Jul 18 '17 at 08:10

1 Answers1

1

Wordnet used in nltk doesnt support arabic. If you are looking for Arabic Wordnet so this is a totally different thing.

For Arabic wordnet, download:

You run it with:

$ python AWNDatabaseManagement.py -i upc_db.xml

Now to get something like wn.synset('إنتظار'). Arabic Wordnet has a function wn.get_synsets_from_word(word), but it gives offsets. Also it accepts the words only as vocalized in the database. For example, you should use جَمِيل for جميل:

>> wn.get_synsets_from_word(u"جَمِيل")
[(u'a', u'300218842')]

300218842 is the offset of the synset of جميل .

I checked for the word إنتظار and seems it doesn't exist in AWN.

More details about using AWN to get synonyms here.

Assem
  • 11,574
  • 5
  • 59
  • 97
  • When I run it using $ python AWNDatabaseManagement.py -i upc_db.xml I got : but when I wrote: wn.get_synsets_from_word(u"جَمِيل") I got Traceback (most recent call last): File "/Users/InjySarhan/PycharmProjects/28Dec/AWNDatabaseManagement.py", line 398, in opts = processCmdlineOpts(sys.argv) File "/Users/InjySarhan/PycharmProjects/28Dec/AWNDatabaseManagement.py", line 320, in processCmdlineOpts if not opts.has_key('i'): AttributeError: 'dict' object has no attribute 'has_key' – IS92 Jan 07 '16 at 17:03
  • run it using python2 not python3 – Assem Jan 07 '16 at 18:37
  • I just tried it using python 2 , run it using $ python AWNDatabaseManagement.py -i upc_db.xml I got : but when I wrote: Synset('delay.n.01') [u'\u0627\u0650\u0646\u0652\u062a\u0650\u0638\u0627\u0631', u'\u062a\u0623\u062c\u0650\u064a\u0644', u'\u062a\u0623\u062e\u0650\u064a\u0631', u'\u062a\u0648\u0642\u0651\u064f\u0641'] Traceback (most recent call last): File "C:/Users/PycharmProjects20Dec.py" line 492, in wn.get_synsets_from_word("جَمِيل") AttributeError: 'WordNetCorpusReader' object has no attribute 'get_synsets_from_word' – IS92 Jan 15 '16 at 13:11
  • write `wn.get_synsets_from_word(u"جَمِيل")` inside`AWNDatabaseManagement.py ` at the end and execute it. – Assem Jan 15 '16 at 13:15
  • I followed ur answer in http://stackoverflow.com/questions/29522161/import-arabic-wordnet-in-python/29545179#29545179 but always get the same error – IS92 Jan 15 '16 at 13:17
  • write `wn.get_synsets_from_word(u"جَمِيل")` inside`AWNDatabaseManagement.py ` at the end and execute it. The same for all instructions. If it worked, we'll discuss how to execute from a different place. – Assem Jan 15 '16 at 13:21
  • when I did that and ran it again $ python AWNDatabaseManagement.py -i upc_db.xml I got: line 407 non-ASCII character '\xd8' in file AWNDatabaseManagement.py on line 407, but no encoding declared – IS92 Jan 15 '16 at 13:27
  • @I.Abdelsalam add this as a second line in the top `# -*- coding: utf-8 -*-` and re-run – Assem Jan 15 '16 at 13:28
  • AWNDatabaseManagement.py ran when I added # -*- coding: utf-8 -*-, but when I run when i run the original file still the same error AttributeError: 'WordNetCorpusReader' object has no attribute 'get_synsets_from_word – IS92 Jan 15 '16 at 14:39
  • you are still using `wn` of `nltk`, dont do. use the `wn` inside the `AWNDatabaseManagement.py `, what code you are running? – Assem Jan 15 '16 at 14:48
  • Please mark this as accepted, and ask the new issue as a new question then send me link I will give you a clear answer. – Assem Jan 15 '16 at 15:04
  • done http://stackoverflow.com/questions/34820968/arabic-word-net-synonyms-in-python thanks – IS92 Jan 15 '16 at 22:36
  • @bigOTHER you say: "Wordnet used in nltk doesnt support arabic." but it does in 3rd version using [Open Multilingual Wordnet](http://compling.hss.ntu.edu.sg/omw/). [sample](https://groups.google.com/forum/embed/#!topic/nltk-users/estFavchR34) – ARZ Feb 14 '16 at 06:09
  • @ARZ Thanks for letting me know – Assem Feb 14 '16 at 08:10
  • This answer is out of date. The nltk now includes a version of the Open Multilingual Wordnet that includes Arabic. – alexis Jul 18 '17 at 08:11