-1

I want to write a recursive function that pretty prints a file hierarchy with some indentation depending on the level of depth inside the hierarchy. For each item printed, i want to display its size as well (if the file is a directory, it should also print the total size of the items beneath it). For example, giving the following file structure:

Folder1
   File.txt (30kb)
   SubFolder1
      image.jpg (300kb)
      dummy.c (100kb)

When running the function starting from Folder1's path, my program should print something like this:

Folder1 - size: 4096; total_size: 442288
\t File.txt - size: 30000;
\t SubFolder1 - size: 4096; total_size: 408192
\t \t image.jpg - size: 300000;
\t \t dummy.c - size: 100000;

My approach was to use a plain dfs and print the nodes in visiting order but I've soon discovered that this won't work because in case of folders I have to first compute the size of its children which means visiting them in the first place, breaking the printing order.

My solution was to store info about each file/folder as it's being discovered by the dfs and then use a second function that does the printing based on that info.

import os

def du(path):
    v = []
    dfs(path, 0, v)
    for entry in v:
        s = '\t' * entry[0]
        s += '{} - du: {}; total_du: {}\n'.format(entry[1], entry[2], entry[3])
        print(s)

def dfs(root='.', depth=0, v=[]):
    root_du = os.path.getsize(root)
    root_total_du = root_du
    v.append([depth, root, root_du, root_total_du])
    i = len(v) - 1

    for entry in os.listdir(root):
        entry_path = os.path.join(root, entry)
        entry_du = os.path.getsize(entry_path)
        entry_total_du = 0
        v.append([depth + 1, entry, entry_du, entry_total_du])
        j = len(v) - 1

        if os.path.isdir(entry_path):
            entry_total_du += dfs(entry_path, depth + 1, v)
            v[j][3] = entry_total_du
            root_total_du += entry_total_du
        else:
            root_total_du += entry_du

    v[i][3] = root_total_du
    return root_total_du

My question is whether there is a more efficient approach of doing this, maybe without using that extra list or a second traversal.

itma96
  • 87
  • 6
  • Does [Calculating a directory's size using Python?](https://stackoverflow.com/questions/1392413/calculating-a-directorys-size-using-python) answer your question? - There are answers showing how to get the total size of everything in a directory *on-the-fly*. – wwii Jan 10 '22 at 21:10
  • .. [very quickly getting total size of folder](https://stackoverflow.com/questions/2485719/very-quickly-getting-total-size-of-folder) – wwii Jan 10 '22 at 22:04
  • @wwii Well, it definitely helps but i was looking to write my own solution so that i can practice a bit of algo along the way :) – itma96 Jan 11 '22 at 09:18

1 Answers1

1

I don't think it would be possible without splitting it up into two passes, given the nature of the problem

  • computing the size of each folder requires something like a post-order traversal where you visit each of children first and return the total to the parent

  • printing the folders from top to bottom in the way that you want might use a pre-order traversal where you print out the parent folder first, followed by each of the children

I would like to say that these two conditions are the bare minimum that must be met in order to solve the problem, and because it's impossible to meet both conditions using a single traversal, there isn't a solution that doesn't involve performing at least a second traversal.

MxLDevs
  • 19,048
  • 36
  • 123
  • 194