1

I am trying to get flat tree from the tree structure like the one given below.

parse tree

I want to get this whole tree in a string like without Bad tree detected error:

( (S (NP-SBJ (NP (DT The) (JJ high) (JJ seven-day) )(PP (IN of) (NP (DT the) (CD 400) (NNS money) )))(VP (VBD was) (NP-PRD (CD 8.12) (NN %) )(, ,) (ADVP (RB down) (PP (IN from) (NP (CD 8.14) (NN %) ))))(. .) ))
Yaroslav Admin
  • 13,880
  • 6
  • 63
  • 83
bob
  • 41
  • 1
  • 6
  • 1
    Why do you want to do that? That just makes it hard to process. Trees are easy and provide lots of structure that you have re-invent from the text. – Ira Baxter Mar 15 '15 at 09:29

4 Answers4

5

You can convert the tree into string using str function then split and join as follow:

parse_string = ' '.join(str(tree).split()) 

print parse_string
Truong-Son
  • 106
  • 1
  • 1
3

Python nltk provide a function for tree manipulation and node extraction

from nltk.tree import Tree
for tr in trees:
    tr1 = str(tr)
    s1 = Tree.fromstring(tr1)
    s2 = s1.productions()
Ryan Vincent
  • 4,483
  • 7
  • 22
  • 31
bob
  • 41
  • 1
  • 6
2

The documentation provides a pprint() method that flattens the tree into one line.

Parsing this sentence:

string = "My name is Ross and I am cool. What's going on world? I'm looking for friends."

And then calling pprint() yields the following:

u"(NP+SBAR+S\n  (S\n    (NP (PRP$ my) (NN name))\n    (VP\n      (VBZ is)\n      (NP (NNP Ross) (CC and) (PRP I) (JJ am) (NN cool.))\n      (SBAR\n        (WHNP (WP What))\n        (S+VP (VBZ 's) (VBG going) (NP (IN on) (NN world)))))\n    (. ?))\n  (S\n    (NP (PRP I))\n    (VP (VBP 'm) (VBG looking) (PP (IN for) (NP (NNS friends))))\n    (. .)))"

From this point, if you wish to remove the tabs and newlines, you can use the following split and join (see here):

splitted = tree.pprint().split()
flat_tree = ' '.join(splitted)

Executing that yields this for me:

u"(NP+SBAR+S (S (NP (PRP$ my) (NN name)) (VP (VBZ is) (NP (NNP Ross) (CC and) (PRP I) (JJ am) (NN cool.)) (SBAR (WHNP (WP What)) (S+VP (VBZ 's) (VBG going) (NP (IN on) (NN world))))) (. ?)) (S (NP (PRP I)) (VP (VBP 'm) (VBG looking) (PP (IN for) (NP (NNS friends)))) (. .)))"
Community
  • 1
  • 1
RossHochwert
  • 160
  • 10
1

NLTK provides functionality to do this right away:

flat_tree = tree._pformat_flat("", "()", False)

tree.pprint() and str(tree) both would call this method internally, but adding extra logic to split it into multiple lines if needed.

caspillaga
  • 573
  • 4
  • 16