Python - Mapping job dependencies in tree structure

Question

I have list of jobs with dependencies and their status. I want to go through each job and map out their dependencies as a tree. The end of the tree will be when both parent dependencies are a status of done - Which means traversal should stop. I'm not sure if I should use recursion or if that's really the only possibility. Is there native mapping tools or data structures that can help me? I'm going to be iterating close to 10K jobs.

Psuedo Code

def map_depends(job_depends):
    for job in job_depends:
        if job.status = done:
            job_tree_map.append(job.name)
        else:
            map_depends(job.get('dependencies'))

def main():       
    for job in batch:
                if job.get('dependencies'):
                    map_depends(job.get('dependencies'))

Visual description of what I'm talking about.

         -> job_depends1.status = done
main_job                                              -> job_depends3 = running -> job_depends6 = done
         -> job_depends2 = running......job_depends2  -> jon_depends4 = done
                                                      ->  job_depends5 = done

Your code looks like a bit inconsistent: `job` vs `jobs`, `job.get(depends)` vs `job.get('dependencies')`. — firegurafiku, Oct 05 '14 at 01:27

score 2 · Answer 1 · answered Oct 05 '14 at 02:21

Trees are naturally recursive datastructures, however, anything that can be written recursively can be done iteratively using explicit stacks.

Sadly for you, you don't get "yield from" until CPython 3.[34], which makes recursive generators impractical in earlier versions.

So if you're still targeting 2.7, you probably should write things recursively first, get them working that way, and then do your generators nonrecursively.

For a stack, a lot of Python developers will use append and pop on a list, or acollections.deque. I might use http://stromberg.dnsalias.org/~strombrg/linked-list/ for the sake of abstraction, though that's often slow in CPython.

score 0 · Answer 2 · edited May 23 '17 at 10:33

If you tree is fairly deep, using recursive algorithm may cause a stack overflow, because CPython by default has a limit of 1000 nested calls. You can set you own limit value with sys.setrecursionlimit but it is generally not recommended. So, in your case it may be better to rewrite tree traversal in the iterative way.

For example, you may write a function like what for tree filtering and traversal (not in pseudo code):

def walkjob(job):
    if job is None:
        return []

    result = [job, ]
    cursor = 0

    while cursor < len(result):
        job = result[cursor]
        result.extend(list(
            filter(lambda j: j.done,
            job.children)))
        cursor += 1

    return result

Here its usage in REPL:

>>> from bunch import Bunch
>>> job = Bunch(value=1, done=True, children=[
...           Bunch(value=2, done=True, children=[]),
...           Bunch(value=3, done=True, children=[
...               Bunch(value=4, done=False, children=[]),
...               Bunch(value=5, done=True, children=[])])])
...
>>> map(lambda j: (j.value, j.done), walkjob(job))
[(1, True), (2, True), (3, True), (5, True)]

Possibly related questions:

What happens if my script causes a stack overflow? Will it take down the server? Can this be contain? — user3590149, Oct 05 '14 at 03:46
Your code will throw `RuntimeError: maximum recursion depth exceeded` error in case of stack overflow. Like any other exception it will bring your application down unless caught (this exception is catchable in Python). — firegurafiku, Oct 05 '14 at 17:52

Python - Mapping job dependencies in tree structure

2 Answers2