0

I am very new to Python, but I have read through the w3schools tutorial before starting out.

A recent web search led me to this helpful script which produces a JSON representation of a file tree.

#!/usr/bin/env python

import os
import errno

def path_hierarchy(path):
    hierarchy = {
        'type': 'folder',
        'name': os.path.basename(path),
        'path': path,
    }

    try:
        hierarchy['children'] = [
>>>         path_hierarchy(os.path.join(path, contents))
            for contents in os.listdir(path)
        ]
    except OSError as e:
        if e.errno != errno.ENOTDIR:
            raise

        if os.path.basename(path).endswith('doc') or os.path.basename(path).endswith('docx'):
            hierarchy['type'] = 'file'
        else:
+++         hierarchy = None


    return hierarchy

if __name__ == '__main__':
    import json
    import sys

    try:
        directory = sys.argv[1]
    except IndexError:
        directory = "/home/something/something"

    print(json.dumps(path_hierarchy(directory), indent=4, sort_keys=True))

I have 2 questions :

  1. At the position marked by ">>>", why doesn't the FOR statement precede the call to the method path_hierarchy?

  2. How do I avoid adding a hierarchy object for a file which is neither "doc" or "docx"? I experimented with setting the hierarchy object to None at the line marked "+++" but this simply returned a "null" in the JSON output. What I would like is no entry at all unless the current item is a folder or a type allowed by my test (in this case either 'doc' or 'docx')

Basil Bear
  • 433
  • 3
  • 15
  • 2
    1. Look up "list comprehension" in your Python tutorial. – Barmar May 05 '20 at 15:16
  • @Barmar thank you for that suggestion – Basil Bear May 05 '20 at 15:19
  • @Carcigenicate My understanding is that the '''json.dumps''' call is only made once at the end of the process, as opposed to once for each entry. Am I wrong on that? – Basil Bear May 05 '20 at 15:20
  • For searching for `doc` or `docx`, you can do something like [here](https://stackoverflow.com/questions/3964681/find-all-files-in-a-directory-with-extension-txt-in-python). If what you mean by `2` is that you want the file to be empty, just return an empty string ('') and not None. – modesitt May 05 '20 at 15:20

1 Answers1

1

For 1, that's a list comprehension. They're used to build up a list from another list.


For 2, really, the problem here is you don't want Nones to be added to hierarchy['children']. This can be done a couple of different ways, but to do this, I'd just modify your >>> line.

If you have Python 3.8+, you can make use of an assignment expression (:=), and add a if check to the list comprehension:

hierarchy['children'] = [
    child := path_hierarchy(os.path.join(path, contents))
    for contents in os.listdir(path)
    if child  # Only add a child if the child is truthy (Not None)
]

Without Python 3.8, you need to convert that chunk to a full for loop:

hierarchy['children'] = []
for contents in os.listdir(path):
    child = path_hierarchy(os.path.join(path, contents))
    if child:
        hierarchy['children'].append(child)

Both are essentially equivalent.

The takeaway here though is to just check what the child is before adding it to the tree.

Carcigenicate
  • 43,494
  • 9
  • 68
  • 117
  • Thank you, most clear. I have now followed up on the advise to read about list comprehension, and I get the idea. Much appreciated. – Basil Bear May 05 '20 at 15:42