4

I'm trying to use breadth_first to search for (first) a specific leaf word and then a certain label (NP) in a ParentedTree. I'd really rather not implement it myself if there's already a method for it. This is what I've tried (including how I made the tree, in case that's where I messed up):

import nltk
from nltk.util import breadth_first

grammar = nltk.data.load("/path/to/grammar.cfg")
parser = nltk.parse.EarleyChartParser(grammar)
sent = "They are happy people"
parse1 = list(parser.parse(sent.split()))
tree1 = nltk.tree.ParentedTree.convert(parse1[0])
bf = breadth_first(tree1)

This gives me a generator object, but I'm not sure how to use it to search for what I want (the pronoun "They"). I tried doing a simple "for node in bf: print(node)" and it printed every single letter of the string on a line by itself, repeating forever, until I had to close the window.

I've read the docs and I've done a lot of googling, but I can't find an example of it actually being used for searching. What am I doing wrong?

Erica
  • 41
  • 4
  • Duplicate of https://stackoverflow.com/questions/31689621/how-to-traverse-an-nltk-tree-object ? – alvas Mar 07 '18 at 23:04
  • I looked at that question; my question is specifically about how to use the NLTK method breadth_first. That question is about traversing depth first, anyway. – Erica Mar 08 '18 at 01:57
  • Interesting, is the recursion still there without explicit break? If so, then it's a bug =) – alvas Mar 08 '18 at 06:05

1 Answers1

4

The nltk.util.breadth_first method does a breadth-first traversal of the tree you provide as a parameter. To use it as a search mechanism you'll need to check each returned result from the generator for your value.

If you iterate through the results of the generator that's returned by breadth_first and output the results at each step of the traversal you can see that it encounters each node in the tree (in BFS order) and ultimately the leaf nodes and character nodes of the tree as well.

So for your case you'd want to use this generator and at each node check some value to see if you've arrived at a node with the symbol or leaf token that you're looking for in your search.

Here's a sample sentence, its parse tree from nltk, and a traversal through the tree.

Good luck!

>>> sentence
'They capture mice in the cells'
>>> parse
Tree('S', [Tree('NP', [Tree('PRP', ['They'])]), Tree('VP', [Tree('VBP', ['capture']), Tree('NP', [Tree('Nom', [Tree('Nom', [Tree('NNS', ['mice'])]), Tree('PP', [Tree('Prep', ['in']), Tree('NP', [Tree('Det', ['the']), Tree('Nom', [Tree('NNS', ['cells'])])])])])])])])
>>> i = 0
>>> for node in breadth_first(parse):
...     print("*"*10)
...     print(node)
...     print(type(node))
...     if i > 10:
...             break
...     i += 1
...
**********
(S
  (NP (PRP They))
  (VP
    (VBP capture)
    (NP
      (Nom
        (Nom (NNS mice))
        (PP (Prep in) (NP (Det the) (Nom (NNS cells))))))))
<class 'nltk.tree.Tree'>
**********
(NP (PRP They))
<class 'nltk.tree.Tree'>
**********
(VP
  (VBP capture)
  (NP
    (Nom
      (Nom (NNS mice))
      (PP (Prep in) (NP (Det the) (Nom (NNS cells)))))))
<class 'nltk.tree.Tree'>
**********
(PRP They)
<class 'nltk.tree.Tree'>
**********
(VBP capture)
<class 'nltk.tree.Tree'>
**********
(NP
  (Nom
    (Nom (NNS mice))
    (PP (Prep in) (NP (Det the) (Nom (NNS cells))))))
<class 'nltk.tree.Tree'>
**********
They
<class 'str'>
**********
capture
<class 'str'>
**********
(Nom
  (Nom (NNS mice))
  (PP (Prep in) (NP (Det the) (Nom (NNS cells)))))
<class 'nltk.tree.Tree'>
**********
T
<class 'str'>
**********
h
<class 'str'>
**********
e
<class 'str'>
tevn
  • 41
  • 3