I'm trying to use NLTK CFG Parser but got the error "Grammar does not cover some of the input words". The code I'm using is:
import nltk
import codecs
strProductions = ''
f = codecs.open('C://nltk_data//corpora//CINTIL_TreeBank//producoes_S.txt', 'r',
encoding= 'latin-1')
for line in f:
strProductions= strProductions + line
f.close()
grammar = nltk.grammar.CFG.fromstring(strProductions)
cp = nltk.ChartParser(grammar)
print grammar
S -> V PNT
V -> 'Choveu'
NP -> DEM N
PP -> P NP
P -> 'de'
NP -> N_
N_ -> N A
N -> 'crian\\xe7a'
tokens = []
a = u'criança'
b = '.'
a= a.encode('latin-1')
for tree in cp.parse(tokens):
print tree
C:\Anaconda2\lib\site-packages\nltk\grammar.pyc in check_coverage(self, tokens)
629 missing = ', '.join('%r' % (w,) for w in missing)
630 raise ValueError("Grammar does not cover some of the "
--> 631 "input words: %r." % missing)
632
633 def _calculate_grammar_forms(self):
ValueError: Grammar does not cover some of the input words:
u"'crian\\xe7a'".
Can someone help me identifying what is happening?
Thanks in advance