-1

Possible Duplicate:
Create a tree-style directory listing in Python

I would like to analyze the file system and print the result as formatted file. My first implementation can simply be plain text, but at a later date I would like to incorporate HTML.

I am using a basic method to gather the files contained within each folder, which I can return and send to the text processor:

def gather_tree(source):
        tree = {}
        for path, dirs, files in os.walk(source):
            tree[path] = []
            for file in files:
                tree[path].append(file)
        return tree

Clearly, the problem here is that I am producing a structure that has no concept of depth which I guess I need to be able to correctly format the list with adequate space and nesting.

My currently very basic print pattern looks like this:

def print_file_tree(tree):
    # Prepare tree
    dir_list = tree.keys()
    dir_list.sort()

    for directory in dir_list:
        print directory
        for f in tree[directory]:
            print f

I am kind new to data structures and would appreciate some input!

Community
  • 1
  • 1
james_dean
  • 1,477
  • 6
  • 26
  • 37
  • Well, your tree isn't really a tree, as you already noted :) Trees are made up of **nodes** that have **parents** and **children**. The Wikipedia article on [graph theory](http://en.wikipedia.org/wiki/Graph_theory) might be a first good read. But, what data structure is best suited depends very much on what you want to do with the data - maybe you don't even need a tree. What kind of work will your text processor do? – Lukas Graf Sep 26 '12 at 21:21
  • First, I'd like to be able to print the file in a suitable format, say XML and HTML. Then I'd like to gather information about the types of files contained within the lising. – james_dean Sep 26 '12 at 21:24
  • Then this seems to be closely related to [this question](http://stackoverflow.com/questions/2104997/os-walk-python-xml-representation-of-a-directory-structure-recursion) (although I would use [lxml](http://lxml.de/) for generating the XML Tree). – Lukas Graf Sep 26 '12 at 21:31
  • Is there a reason you need to build a data structure in the first place? As an iterator, os.walk effectively _is_ a data structure. If you just need to parse it in a single pass to generate your output, you don't need to build anything intermediate. (Alternatively, you can build your own recursive iterator very simply to replace os.walk, and again just use the iterator as a data structure.) If you need to do some additional logic that requires you to have, e.g., a random-accessible tree-like structure, tell us what that logic is. – abarnert Sep 26 '12 at 22:59

1 Answers1

2

If you plan to create XML from your data, you will in fact have to create a tree-like structure. This can be an intermediate tree that you build, annotate and possible iterate or traverse over again, and then transform to an ElementTree for creating XML. Or you can directly construct an ElementTree using lxml's ElementTree API.

Either way, using os.walk() is not the way to go. This may sound counter-intuitive, but the point is this: os.walk() serializes (flattens) the file system tree for you, so you can easily iterate over it and won't have to deal with writing a recursive function that does that. In your case however, you want to retain the tree structure, therefore it's much easier if you write that recursive function yourself.

This is an example for how you can build an ElementTree using lxml.

(This code is loosely based on @MikeDeSimone's answer to a similar question)

import os
from lxml import etree


def dir_as_tree(path):
    """Recursive function that walks a directory and returns a tree
    of nested etree nodes.
    """
    basename = os.path.basename(path)
    node = etree.Element("node")
    node.attrib['name'] = basename
    # Gather some more information on this path here
    # and write it to attributes
    # ...
    if os.path.isdir(path):
        # Recurse
        node.tag = 'dir'
        for item in sorted(os.listdir(path)):
            item_path = os.path.join(path, item)
            child_node = dir_as_tree(item_path)
            node.append(child_node)
        return node
    else:
        node.tag = 'file'
        return node

# Create a tree of the current working directory
cwd = os.getcwd()
root = dir_as_tree(cwd)

# Create an element tree from the root node
# (in order to serialize it to a complete XML document)
tree = etree.ElementTree(root)

xml_document = etree.tostring(tree,
                              pretty_print=True,
                              xml_declaration=True,
                              encoding='utf-8')
print xml_document

Example output:

<?xml version='1.0' encoding='utf-8'?>
<dir name="dirwalker">
  <dir name="top1">
    <file name="foobar.txt"/>
    <dir name="sub1"/>
  </dir>
  <dir name="top2">
    <dir name="sub2"/>
  </dir>
  <dir name="top3">
    <dir name="sub3">
      <dir name="sub_a"/>
      <dir name="sub_b"/>
    </dir>
  </dir>
  <file name="topfile1.txt"/>
  <file name="walker.py"/>
</dir>
Community
  • 1
  • 1
Lukas Graf
  • 30,317
  • 8
  • 77
  • 92