In POSIX nor in Windows, you can't get all of that data in one OS call. At a minimum, for POSIX, there will be three per directory (opendir
, readdir
, close
), plus one per directory entry (stat
).
I believe that what follows will result in fewer OS calls than what you posted. Yes, the os.walk()
call is lazy; that is, the entire tree is not in memory upon the return from walk()
, but is rather read in piecemeal during the calls to next()
.
Thus, my version will read in only the 1st-order descendants directories, and will stat
only the immediate children and grandchildren. Your version will do that work for all of the great-grandchildren as well, for as deep as your directory structure is.
root='.'
grandChildren = []
for kid in next(os.walk('.'))[1]:
x = next(os.walk(os.path.join('.', kid)))
for grandKid in x[1]: # (or x[1]+x[2] if you care about regular files)
grandChildren.append(os.path.join(x[0], grandKid))
Or, as a list comprehension instead of a for loop:
import os
root='.'
grandChildren = [
os.path.join(kid, grandKid)
for kid in next(os.walk(root))[1]
for grandKid in next(os.walk(os.path.join(root, kid)))[1]]
Finally, factoring out the os.walk
s into a function:
def read_subdirs(dir='.'):
import os
return (os.path.join(dir,x) for x in next(os.walk(dir))[1])
root='.'
grandChildren = [
grandKid
for kid in read_subdirs(root)
for grandKid in read_subdirs(kid)]
From testing, we can see that my version calls
stat
many fewer times than your does if there are great-grandchildren.
In my home directory, for example, I ran my code (/tmp/a.py
) and yours (/tmp/b.py
) with root
set to '.'
in each case:
$ strace -e stat python /tmp/a.py 2>&1 > /dev/null | egrep -c stat
1245
$ strace -e stat python /tmp/b.py 2>&1 > /dev/null | egrep -c stat
36049