2

So I have this tree returned to me

Tree('S', [('This', 'DT'), ('is', 'VBZ'), ('a', 'DT'), ('test', 'NN'), (',', ','), Tree('PERSON', [('Stackoverflow', 'NNP'), ('Users', 'NNP')]), ('.', '.')])

I can turn this into a nice python list like so

sentence = "This is a test, Stackoverflow Users."
tokens = nltk.word_tokenize(sentence)
tagged = nltk.pos_tag(tokens)
entities = nltk.chunk.ne_chunk(tagged)
tree = repr(entities) # THIS VARIABLE IS THE TREE THAT IS RETURNED TO ME
# below this point it's about turning the tree into a python list
tree = (("[" + tree[5:-1] + "]")).replace("Tree", "").replace(")", "]").replace("(", "[")
tree = ast.literal_eval(tree) #you'll need to import ast (included with python)

now, the tree variable is this:

['S', [['This', 'DT'], ['is', 'VBZ'], ['a', 'DT'], ['test', 'NN'], [',', ','], ['ORGANIZATION', [['Stackoverflow', 'NNP']]], ['users', 'NNS'], ['.', '.']]]

When I try to iterate through and get a string of the sentence, I get

"This is a test, ORGANIZATION."

instead of the desired

"This is a test, Stackoverflow users."

I cannot simply use the sentence variable, I need to be able to get the sentence back from this list of lists. Any code snippets or suggestions would be greatly appreciated.

alvas
  • 115,346
  • 109
  • 446
  • 738
Robbie Barrat
  • 510
  • 1
  • 6
  • 24
  • I had to install nltk and download some packages using nltk.download(). Now I am stuck. What is ast? It is not defined in your code. Can you edit your question? – Ohumeronen Jul 19 '16 at 13:57
  • Sorry! try adding "import ast" to the top of your code, it's included with python. – Robbie Barrat Jul 19 '16 at 14:06

1 Answers1

8
>>> from nltk import Tree
>>> yourtree = Tree('S', [('This', 'DT'), ('is', 'VBZ'), ('a', 'DT'), ('test', 'NN'), (',', ','), Tree('PERSON', [('Stackoverflow', 'NNP'), ('Users', 'NNP')]), ('.', '.')])
>>> yourtree.leaves()
[('This', 'DT'), ('is', 'VBZ'), ('a', 'DT'), ('test', 'NN'), (',', ','), ('Stackoverflow', 'NNP'), ('Users', 'NNP'), ('.', '.')]
>>> tokens, pos = zip(*yourtree.leaves())
>>> tokens
('This', 'is', 'a', 'test', ',', 'Stackoverflow', 'Users', '.')
>>> pos
('DT', 'VBZ', 'DT', 'NN', ',', 'NNP', 'NNP', '.')

See also: How to Traverse an NLTK Tree object?

Community
  • 1
  • 1
alvas
  • 115,346
  • 109
  • 446
  • 738