I need to create a not condition as part of my grammar in NLTK's regex parser. I would like to chunk those words which are of structure 'Coffee & Tea'
but it should not chunk if there is a word of type <IN>
before the sequence. For example 'in London and Paris'
should not be chunked by the parser.
My code is as follows:
grammar = r'''NP: {(^<IN>)<NNP>+<CC><NN.*>+}'''
I tried the above grammar to solve the problem but it is not working could someone please tell me what I am doing wrong.
Example:
def parse_sentence(sentence):
pos_sentence = nltk.pos_tag(nltk.word_tokenize(sentence))
grammar = r'''NP: {<NNP>+<CC><NN.*>+}'''
parser = nltk.RegexpParser(grammar)
result = parser.parse(pos_sentence)
print result
sentence1 = 'Who is the front man of the band that wrote Coffee & TV?'
parse_sentence(sentence1)
sentence2 = 'Who of those resting in Westminster Abbey wrote a book set in London and Paris?'
parse_sentence(sentence2)
Result for sentence 1 is:
(S
Who/WP
is/VBZ
the/DT
front/JJ
man/NN
of/IN
the/DT
band/NN
that/WDT
wrote/VBD
(NP Coffee/NNP &/CC TV/NN)
?/.)
Result for sentence2 is:
(S
Who/WP
of/IN
those/DT
resting/VBG
in/IN
Westminster/NNP
Abbey/NNP
wrote/VBD
a/DT
book/NN
set/VBN
in/IN
(NP London/NNP and/CC Paris/NNP)
?/.)
As can be seen in both sentence1 and sentence2 the phrases Coffee & Tea
and London and Paris
get chunked as a group although I do not wish to chunk London and Paris
. One way of doing that is to ignore those patterns which are preceded by a <IN>
POS Tag.
In a nutshell I need to know how to add NOT(negation) conditions for POS tags in a regex parser's grammar. Standard syntax of using '^' followed by the tag definition does not seem to work