4

I am facing difficulty in getting the xml structure listing all the directories/ sub directories inside a given directory. I got that working using the recursion in the given post My problem is little bit tougher than usual. I have directories that may have 10000 of files in it so checking every content to see if its a directory or not is going to be costly and its already taking to long to build the xml. I want to build the xml for directories only.

I know linux has some command like find . -type d to list the directories present (not the files). How can I achieve this in python.

Thanks in advance.

Community
  • 1
  • 1
Anurag Sharma
  • 4,839
  • 13
  • 59
  • 101

3 Answers3

7

os.walk already distinguishes between files and directories:

def find_all_dirs(root='.'):
    for path,dirs,files in os.walk(root):
        for d in dirs:
            yield os.path.join(path, d)
phihag
  • 278,196
  • 72
  • 453
  • 469
  • @phihagos.walk recurs itself. I am avoiding this because I have to create a xml of the sub directories present inside a given directory – Anurag Sharma Sep 27 '12 at 15:17
  • Sorry, what do you mean by "recurs itself"? If you're only creating a *file*, that shouldn't have an impact on a tree of *directories*. But even if you'd create a directory, you can simply store the whole tree in a variable with `dirs = list(find_all_dirs())` and then operate on that list. – phihag Sep 27 '12 at 15:19
  • @phihag-I was trying to build the XML through recursion itself. yes I can store the whole tree in a variable and create a xml out of it, – Anurag Sharma Sep 27 '12 at 15:25
2

For just one directory...

import os

def get_dirs(p):
  p = os.path.abspath(p)
  return [n for n in os.listdir(p) if os.path.isdir(os.path.join(p, n))]

print "\n".join(get_dirs("."))
spiralx
  • 1,035
  • 7
  • 16
  • @spiralx-no in favor of using os.path.isdir, dont want to check every content to see if its a directory or not as directory may contain 10000 of files in it – Anurag Sharma Sep 27 '12 at 17:09
  • Well that's about as fast as you're going to get using the standard library I think, there's no command to just get directories - how would it work? You could use the `subprocess` module to execute a binary to do it, but I don't know if it would be quicker. – spiralx Sep 27 '12 at 17:38
0

Here is the solution I got after searching and trying different things. I am not saying that this if faster than then method of looking every content in a directory, but it actually produces the result much more quicker( difference visible when directory contain 1000 of files)

import os
import subprocess
from xml.sax.saxutils import quoteattr as xml_quoteattr

def DirAsLessXML(path):

    result = '<dir type ={0} name={1} path={2}>\n'.format(xml_quoteattr('dir'),xml_quoteattr(os.path.basename(path)),xml_quoteattr(path))

    list = subprocess.Popen(['find', path,'-maxdepth', '1', '-type', 'd'],stdout=subprocess.PIPE, shell=False).communicate()[0]

    output_list = list.splitlines()
    if len(output_list) == 1:
        result = '<dir type ={0} name={1} path={2}>\n'.format(xml_quoteattr('leaf_dir'),xml_quoteattr(os.path.basename(path)),xml_quoteattr(path))

    for item in output_list[1:]:
        result += '\n'.join('  ' + line for line in DirAsLessXML(item).split('\n'))
    result += '</dir>\n'
    return result
Anurag Sharma
  • 4,839
  • 13
  • 59
  • 101