I want to write a recursive function that pretty prints a file hierarchy with some indentation depending on the level of depth inside the hierarchy. For each item printed, i want to display its size as well (if the file is a directory, it should also print the total size of the items beneath it). For example, giving the following file structure:
Folder1
File.txt (30kb)
SubFolder1
image.jpg (300kb)
dummy.c (100kb)
When running the function starting from Folder1's path, my program should print something like this:
Folder1 - size: 4096; total_size: 442288
\t File.txt - size: 30000;
\t SubFolder1 - size: 4096; total_size: 408192
\t \t image.jpg - size: 300000;
\t \t dummy.c - size: 100000;
My approach was to use a plain dfs and print the nodes in visiting order but I've soon discovered that this won't work because in case of folders I have to first compute the size of its children which means visiting them in the first place, breaking the printing order.
My solution was to store info about each file/folder as it's being discovered by the dfs and then use a second function that does the printing based on that info.
import os
def du(path):
v = []
dfs(path, 0, v)
for entry in v:
s = '\t' * entry[0]
s += '{} - du: {}; total_du: {}\n'.format(entry[1], entry[2], entry[3])
print(s)
def dfs(root='.', depth=0, v=[]):
root_du = os.path.getsize(root)
root_total_du = root_du
v.append([depth, root, root_du, root_total_du])
i = len(v) - 1
for entry in os.listdir(root):
entry_path = os.path.join(root, entry)
entry_du = os.path.getsize(entry_path)
entry_total_du = 0
v.append([depth + 1, entry, entry_du, entry_total_du])
j = len(v) - 1
if os.path.isdir(entry_path):
entry_total_du += dfs(entry_path, depth + 1, v)
v[j][3] = entry_total_du
root_total_du += entry_total_du
else:
root_total_du += entry_du
v[i][3] = root_total_du
return root_total_du
My question is whether there is a more efficient approach of doing this, maybe without using that extra list or a second traversal.