I am trying to split a chunk at the position of a colon : in NLTK but it seems its a special case. In normal regex I can just put it in [:]
no problems.
But in NLTK no matter what I do it does not like it in the regexParser.
from nltk import RegexpParser
grammar = r"""
NP: {<DT|PP\$>?<JJ>*<NN>|<NNP.*><\:><VBD>} # chunk (Rapunzel + : + let) together
{<NNP>+}
<.*>}{<VBD.*>
"""
cp = RegexpParser(grammar)
sentence = [("Rapunzel", "NNP"), (":",":"), ("let", "VBD"), ("down", "RP"), ("her", "PP$"), ("long", "JJ"), ("golden", "JJ"), ("hair", "NN")]
print(cp.parse(sentence))
The above code does make a chunk picking up the colon as a block. <.*>}{<\VBD.*> line splits the chunk made up of (Rapunzel+:+let) at the position before let. if you take out that split and replace with the colon it gives a error
from nltk import RegexpParser
grammar = r"""
NP: {<DT|PP\$>?<JJ>*<NN>|<NNP.*><\:><VBD>} # chunk (Rapunzel + : + let) together
{<NNP>+}
<.*>}{<\:.*>
"""
cp = RegexpParser(grammar)
sentence = [("Rapunzel", "NNP"), (":",":"), ("let", "VBD"), ("down", "RP"), ("her", "PP$"), ("long", "JJ"), ("golden", "JJ"), ("hair", "NN")]
print(cp.parse(sentence))
ValueError: Illegal chunk pattern: >
Can anyone explain how to do this, I tried Google and going through the docs but I am none the wiser. I can deal with this post chunk no problem, but I just got to know why or how. :-)