1

I have a YAML file with a folder tree which looks like this:

---
- folder1
- folder2:
    - subfolder1:
        - deepfolder1
    - subolder2
- folder3
- folder4
...

I'm opening it:

with open(yaml_file) as f:
        tree = yaml.load(f)

And I want to compare it with a URL path.

I'm then splitting the URL elements to get a list [folder1, folder2]

path_elements = parse.unquote_plus(request_path).split(sep)

request_path is supposed to be a relative link to a folder (without trailing slash).

I want to check if the request_path lies in the YAML folder tree and then return e.g. True.

But then I'm kind of lost as to how to compare the two objects in a sorted and "pythonic" way.

Everything I come up with has a lot of loops and feels extremely bloated and neither smart nor modern.

I'm using Python 3.4 and am really new to Python in general.

If there is a better way to do it (other structure in the YAML file or different approach to comparing these, every suggestion is welcome!

basbebe
  • 567
  • 1
  • 9
  • 25

1 Answers1

0

You may need to explain a bit more about how you're expecting to "compare" the two objects, but I'm going to guess that your main problem is that you want to turn that nested directory structure into a flat list of paths. I.e. you want this nested structure:

- folder2:
    - subfolder1:
        - deepfolder1
        - deepfolder2

To become a flat list of these:

folder2/subfolder1/deepfolder1
folder2/subfolder1/deepfolder2

This is a form of "tree traversal".

One tricky part here is that normally trees are represented as lists of lists, but your YAML is mixing associative arrays (AKA dicts or hashes) and lists. So this makes the code a little more complicated.

Here is a recursive tree traversal for the data you've given:

def traverse(t, prefix=None):
    prefix = prefix or []

    if len(t) == 0:
        raise StopIteration
    elif len(t) == 1:
        first, rest = t[0], []
    else:
        first, rest = t[0], t[1:]

    #walk first element
    if isinstance(first, str):
        #it's a single node
        yield prefix + [first]
    elif isinstance(first, list):
        #it's a list of nodes
        for element in first:
            for tmp in traverse(element, prefix=prefix):
                yield tmp
    elif isinstance(first, dict):
        #there's another level of nesting
        for sub in first:
            for tmp in traverse(first[sub], prefix=(prefix + [sub])):
                yield tmp

    #walk rest of elements recursively
    for element in traverse(rest, prefix=prefix):
        yield element

for expanded_path in traverse(tree):
    print(expanded_path)

If you're on python 3.4 you can use yield from to clean up the "for tmp in ...: yield tmp" parts. Full code here.

When I run this on your data I get:

['folder1']
['folder2', 'subfolder1', 'deepfolder1']
['folder2', 'subolder2']
['folder3']
['folder4']

These expanded paths are then in the same format as your path_elements variable, so we can now compare them against each other.

You might want to search SO or python recipes for a better tree traversal algorithm, mine might not be the most efficient (there is a limit to python's recursion depth, so you might need to use an iterative version in production).

Edit: In response to your comment: "return 'True' if the request_path is within the tree structure", you just need to loop over the expanded paths and see if request_path matches any of them:

def compare(request_path, tree):
    path_elements = parse.unquote_plus(request_path).split(sep)
    for expanded_path in traverse(tree):
        if expanded_path == path_elements:
            return True
    return False

But it depends somewhat on what's in request_path, is it an full URL (http://www.blah.com/foo/boo.txt) or an absolute URL (/foo/boo.txt) or a relative URL (foo/boo.txt)? If so you might need to clean up the paths before comparing them. That's all pretty easy to do though (search SO for splitting paths and URLs), walking the tree is the complex part.

Community
  • 1
  • 1
Steven Kryskalla
  • 14,179
  • 2
  • 40
  • 42
  • thanks so far! I added a more specific goal to the question: return 'True' if the request_path is within the tree structure. – basbebe Feb 24 '15 at 19:38
  • That part's easy once you have the list of expanded paths. Just loop over all the expanded paths and compare each one against the request_path. I edited my answer with this info. – Steven Kryskalla Feb 24 '15 at 19:48