3

How can we write Sanskrit grammar rules for parsing in NLTK Python? Is there any tagged corpus available in Python NLTK?

I tried to write a grammar as usual like this:

grammar = CFG.fromstring("""
S -> NP VP
PP -> P NP
NP ->  NN JJ| NNP VP| 'I'
VP -> V NP | VP PP
NN -> u'बालः' | u'पुस्तकं'|u'कागदम्'
VP -> u'भजति'|u'अधावत्' |u'अर्चयन्ति' 
NNP -> u'हरिं '
""")

But it returns an error as below:

File "/usr/local/lib/python2.7/dist-packages/nltk/grammar.py", line 519, in fromstring
    encoding=encoding)
  File "/usr/local/lib/python2.7/dist-packages/nltk/grammar.py", line 1245, in read_grammar
    lines = input.split('\n')
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe0 in position 76: ordinal not in range(128)

I started with python 3 but even after installing nltk package, it returns an error 'ImportError: No module named 'nltk''. Can anyone tell me how to install NLTK for python 3, and why it gives such an error message?

James K
  • 3,692
  • 1
  • 28
  • 36
Sreedeepa
  • 39
  • 2
  • 1
    Consider switching to python 3, for less Unicode related pains. – Frames Catherine White Oct 13 '16 at 07:24
  • Or if you insist on using Python 2, there are some conventions you need to adhere to in order to use Unicode in your source files. http://stackoverflow.com/questions/6289474/working-with-utf-8-encoding-in-python-source – tripleee Oct 13 '16 at 07:30
  • thanks for the reply, i already included that encoding scheme specification line in my code like # -*- coding: utf-8 -*-, yet it returns that error. – Sreedeepa Oct 13 '16 at 09:57
  • Actualy i started with python 3 but even after installing nltk package, it returns an error 'ImportError: No module named 'nltk''. Can anyone tell the solution and why it give such an error message? – Sreedeepa Oct 13 '16 at 09:59
  • How did you install nltk? On my computer running `pip3.5 install nltk` worked fine. – rassar Oct 15 '16 at 19:31
  • @Sreedeepa its not only about the setting in your text editor. note how in the post linked to above they make use of "decode" etc. Or, if you are reading in / writing out, make sure to use the [codecs](https://docs.python.org/2/library/codecs.html) module. – patrick Oct 15 '16 at 19:47

0 Answers0