3

In Summary

I am creating a tree structure out of a text file input using a function from SO question: Python file parsing: Build tree from text file. But I am able to produce my tree only by using a global variable and cannot find a way of avoiding this.

Input Data

In a file called data.txt I have the following:

Root
-A 10
-B
--B A 2
--B B 5
--B Z 9
--B X
---X 4
---Y
----Y0 67
----Y1 32
---Z 3
-C 19

Desired Result

{'B': ['B A 2', 'B B 5', 'B Z 9', 'B X'],
 'B X': ['X 4', 'Y', 'Z 3'],
 'Root': ['A 10', 'B', 'C 19'],
 'Y': ['Y0 67', 'Y1 32']}

My Code

import re, pprint
PATTERN = re.compile('^[-]+')
tree = {}

def _recurse_tree(parent, depth, source):
    last_line = source.readline().rstrip()
    while last_line:
        if last_line.startswith('-'):
            tabs = len( re.match(PATTERN, last_line).group() )
        else:
            tabs = 0
        if tabs < depth:
            break
        node = re.sub(PATTERN, '', last_line.strip())
        if tabs >= depth:
            if parent is not None:
                print "%s: %s" %(parent, node)
                if parent in tree:
                    tree[parent].append(node)
                else:
                    tree[parent] = [ node, ]
            last_line = _recurse_tree(node, tabs+1, source)
    return last_line

def main():
    inFile = open("data.txt")
    _recurse_tree(None, 0, inFile)
    pprint.pprint(tree)

if __name__ == "__main__":
    main()

The Problem

How do I get rid of the global variable tree? Everything I do seems to make the code much longer or uglier but I would like to use the function heavily and I hate depending on side-effects for the core result.

Supplement

After the answers below, I revised the code to return the tree in the following way. Is this pythonic? Returning a tuple and then tossing the first element seems inelegant.

def _recurse_tree(parent, depth, source, tree=None):
    if tree is None:
        tree = {}
    last_line = source.readline().rstrip()
    while last_line:
        if last_line.startswith('-'):
            tabs = len( re.match(PATTERN, last_line).group() )
        else:
            tabs = 0
        if tabs < depth:
            break
        node = re.sub(PATTERN, '', last_line.strip())
        if tabs >= depth:
            if parent is not None:
                print "%s: %s" %(parent, node)
                if parent in tree:
                    tree[parent].append(node)
                else:
                    tree[parent] = [ node, ]
            last_line, tree = _recurse_tree(node, tabs+1, source, tree)
    return last_line, tree

def main():
    inFile = open("data.txt")
    tmp, tree = _recurse_tree(None, 0, inFile)
    pprint.pprint(tree)
Community
  • 1
  • 1
daedalus
  • 10,873
  • 5
  • 50
  • 71

3 Answers3

4

Your tree variable is already a mutable; just pass it along with your recursive calls:

def _recurse_tree(parent, depth, source, tree=None):
    if tree is None:
        tree = {}

    last_line = source.readline().rstrip()
    while last_line:
        if last_line.startswith('-'):
            tabs = len( re.match(PATTERN, last_line).group() )
        else:
            tabs = 0
        if tabs < depth:
            break
        node = re.sub(PATTERN, '', last_line.strip())
        if tabs >= depth:
            if parent is not None:
                print "%s: %s" %(parent, node)
                if parent in tree:
                    tree[parent].append(node)
                else:
                    tree[parent] = [ node, ]
            last_line = _recurse_tree(node, tabs+1, source, tree)
    return last_line

Alternatively, you can use a class to hold the state, it'll be easier to then pluck the state from the instance:

class TreeBuilder(object):
    _PATTERN = re.compile('^[-]+')

    def __init__(self, source):
        self.tree = {}
        self.source = source
        self._recurse_tree()

    def _recurse_tree(self, parent=None, depth=0):
         last_line = self.source.readline().rstrip()
         while last_line:
             if last_line.startswith('-'):
                 tabs = len( self._PATTERN.match(last_line).group() )
             else:
                 tabs = 0
             if tabs < depth:
                 break
             node = self._PATTERN.sub('', last_line.strip())
             if tabs >= depth:
                 if parent is not None:
                     print "%s: %s" %(parent, node)
                     if parent in self.tree:
                         self.tree[parent].append(node)
                     else:
                         self.tree[parent] = [ node, ]
                 last_line = self._recurse_tree(node, tabs+1)
         return last_line

Then use this like this:

def main():
    inFile = open("data.txt")
    builder = TreeBuilder(inFile)
    pprint.pprint(builder.tree)
daedalus
  • 10,873
  • 5
  • 50
  • 71
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • Thanks Martijn -- I am accepting this solution. Can I get a comment on my supplementary question? I am seeking an elegant way of retrieving the `tree` once it is created and the recursion is finished. – daedalus Nov 17 '12 at 18:35
  • 1
    @gauden: we usually try to keep it to *one* question per post. I'd say it's fine in this case to toss one of the return values of your recursive function, yes. Alternatively, use a class to hold the state instead and then at the end just retrieve the built tree from the class instance you used. – Martijn Pieters Nov 17 '12 at 18:37
  • Thanks @martijn-pieters. I've been around SO long enough to know the rules, so apologies for overstepping this time. – daedalus Nov 17 '12 at 18:39
  • 1
    @gauden: your in luck; I've given you a class-based version anyway, which makes it more pythonic and gives you the tree as an easy-to-reach attribute. – Martijn Pieters Nov 17 '12 at 18:45
1

I think good solution here — create class and put tree inside, like a private class member.

Or you can simply pass this dictionary as one of the parameters in function and pass it during recursion. It'll pass by reference, so all time function will use the same dictionary without global variable. But I'd prefer class.

cleg
  • 4,862
  • 5
  • 35
  • 52
0

Use tree as default parameter in your function: -

def _recurse_tree(parent, depth, source, tree = None):
    if tree is None:  # needed on first invocation
        tree = {}

Call it first time without tree parameter, and on every successive call, add another parameter tree to it.

So, from inside your method your recursive call becomes: -

last_line = _recurse_tree(node, tabs+1, source, tree)
Rohit Jain
  • 209,639
  • 45
  • 409
  • 525
  • 1
    Ah, I think that giving the advice to use a mutable as the default is the problem here, perhaps? :-) Surely you are familiar with ["Least Astonishment" in Python: The Mutable Default Argument](http://stackoverflow.com/q/1132941)? – Martijn Pieters Nov 17 '12 at 18:17
  • @MartijnPieters.. But, that's what is needed. Why would it matter here. `tree` is getting changed on successive call. Still can't understand why the downvote. :( – Rohit Jain Nov 17 '12 at 18:18
  • @MartijnPieters.. Yeah, that thing I know. But that would matter if we invoke that method multiple times from outside recursion. In recursion, would it matter? I think it should not. – Rohit Jain Nov 17 '12 at 18:19
  • 1
    You can now use this function (from `main()`) only *once*. Next time you call it it'll have the data from the last call still there. – Martijn Pieters Nov 17 '12 at 18:19
  • @MartijnPieters.. Yeah that can be a problem, which I understand. I think I would change it to `None`. – Rohit Jain Nov 17 '12 at 18:21
  • Yep, define this `def a(tree = {}):tree[len(tree)] = 1;return tree;`, and then call `a()` multiple times. – John Vinyard Nov 17 '12 at 18:21