1

I'm trying to work out the fastest way of processing a nested tree of jobs that return promises in javascript. I have the following conditions:

  1. The process should return a promise that only resolves when all of the nested jobs are complete.
  2. Each job at the same level needs to be processed in sequence, but children can be processed in parallel.
  3. I can't use await/async.

I have mocked up an example that sums a bunch of numbers and calls a public web service to simulate doing actual work (unfortunately I can't codepen this as the public web service is not available over https):

    function sumNumbers(numbersToSum) {
        return numbersToSum.reduce((p, current) => {
            return p.then(runningTotal => {
                return fetch(`http://numbersapi.com/${current.number}/`)
                    .then(result => {
                        var parentTotal = runningTotal + current.number;
                        if (current.children && current.children.length > 0) {
                            return sumNumbers(current.children)
                                .then(childrenTotal => {
                                    return childrenTotal + parentTotal;
                                });
                        }
                        else {
                            return parentTotal;
                        }
                    });
            });
        }, Promise.resolve(0));
    }

    var numbers = [
        {
            number: 2,
            children: [
                { number: 1 },
                {
                    number: 3,
                    children: [
                        { number: 2 },
                        {
                            number: 1,
                            children: [
                                { number: 1 },
                                { number: 2 }
                            ]
                        }
                    ]
                },
                { number: 2 }
            ]
        },
        { number: 4 },
        { number: 5 },
        {
            number: 3,
            children: [
                {
                    number: 1,
                    children: [
                        { number: 1 }
                    ]
                },
                { number: 2 }
            ]
        }
    ];

    (() => {
        var startTime = new Date();
        sumNumbers(numbers)
            .then(total => {
                var finishTime = new Date();
                console.log(`${total} (took ${((finishTime - startTime) / 1000)}s)`);
            });
    })();

When I run that in my web console i get the following result:

30 (took 2.839s)

This approach works but when a job has children to process the parent waits for the child jobs to finish before resolving and moving on to the next sibling job. I'm wondering whether it would be faster to process the child jobs in parallel?

Is this the case? If so how would you rewrite the example to take advantage of that approach? If not, is there a faster way of doing this in general?

Rich Browne
  • 313
  • 3
  • 11
  • In parallel? In a single-threaded language? What would that entail? – Pointy Feb 25 '19 at 14:13
  • Is javascript always single threaded? – Rich Browne Feb 25 '19 at 14:18
  • "Is javascript always single threaded?" Yes. – Roberto Zvjerković Feb 25 '19 at 14:19
  • 1
    "*Each job at the same level needs to be processed in sequence, but children can be processed in parallel.*" - I don't get it. Children are at the same level, right? – Bergi Feb 25 '19 at 14:19
  • @Pointy I believe he means the fetch requests can be in parallel, and of course certainly possible. Javascript is single threaded, but the browser or node engine most certainly is not. – Keith Feb 25 '19 at 14:20
  • Well yes that's possible; I guess I assumed that if there's a data structure filled with Promise instances that the asynchronous operations had *already* been started. – Pointy Feb 25 '19 at 14:22
  • @Bergi have a look at the `numbers` var in the example, it's a nested tree. – Rich Browne Feb 25 '19 at 14:22
  • Sorry guys, I didn't want people to get hung up over the "but javascript isn't multi-threaded" issue, I just want to know the fastest way of doing this in javascript in the browser. – Rich Browne Feb 25 '19 at 14:23
  • 2
    @RichBrowne Yes, but what parts of the tree do you want to get processed concurrently? You say that consecutive nodes need to be processed sequentially, but arent't children nodes consecutive as well? Do you mean children of different parent nodes? Lets assume I would simply process everything at once, where exactly do you want to introduce a dependency? – Bergi Feb 25 '19 at 14:25
  • 1
    Yes, I'm struggling with what's not & is not concurrent. Is it that it's just the root node that has to be in series, but after that all children at any level can be concurrent. ? eg. `numbers 2,1,2,4,5` is fine, but `2,1,2,5,4` is not – Keith Feb 25 '19 at 14:27
  • Ok is it clearer if I just say "numbers at the same nested level must be processed in sequence"? – Rich Browne Feb 25 '19 at 14:30
  • 1
    But by that definition, doesn't that just mean everything has to be processed in sequence? eg. doing the first top 1 level, cannot do the next until the children of the first are done, and then the children of the first level also need to be in sequence. – Keith Feb 25 '19 at 14:32
  • @ritaj have a [read](https://stackoverflow.com/questions/2734025/is-javascript-guaranteed-to-be-single-threaded) – Rich Browne Feb 25 '19 at 14:32
  • @Keith ok a different way of putting it, in the example `sumNumbers` function I'm using `reduce` so that each number is processed in sequence. The alternative would be to use `map` which could be faster but would then be out of sequence. Also in the example, children are processed by recursing and it waits for the recurision to finish before resolving the parent. My quesiton/point was would it be faster to resolve the parent straight away after kicking off the child job, then summing the result later? – Rich Browne Feb 25 '19 at 14:36
  • @RichBrowne Javascript is still single threaded, that Op's answer is misleading if he say's is multi-threaded, because an event fired straight away. All that happened there was the `focus()` method fired the event. And that doesn't require mullt-threading to do it. – Keith Feb 25 '19 at 14:37
  • `then summing the result later?` That's not going to speed anything up no. The slowest part of your process is the `fetch`, and your code does this in sequence,. IOW: If each fetch takes 1 second, your result will take 30 seconds to complete. But do you need to do the fetches in sequence, if it's just the order that returned you can push the promises onto an array and `Promise.all()` , the sequence from `promise.all` is maintained. This would of course be much faster. – Keith Feb 25 '19 at 16:35
  • @Keith no unfortunately it's not just the order, each job needs to complete in sequence to avoid race conditions. As I mentioned before, only the fetches at the same nested level need to be done in squence. Using the `numbers` var from the example, there are 4 fetches at first nested level, 5 at the second level, 3 at the third and 2 at the fourth. Taking the first nested level, those 5 fetches must be done in sequence BUT at the same time the fetches from the other levels could be be executed in parallel, which would save time overall right? – Rich Browne Feb 25 '19 at 18:30
  • Or being more explicit, the first fetch from the first level is executing at the same time as the first fetches from the second, third and fourth. That's 4 requests executing in parallel rather than in serial (as in the example code in the op) which would surely save significant time when scaled out to a bigger example. Does that make sense? – Rich Browne Feb 25 '19 at 18:34

1 Answers1

2

Would it be faster to resolve the parent straight away after kicking off the child job, then summing the result later?

Yes, it would. If understand correctly, you would want only sibling nodes to be processed sequentially, and have each node wait for its predecessors before starting the work of its children, but not have a node wait for its children to finish.

Or being more explicit, the first fetch from the first level is executing at the same time as the first fetches from the second, third and fourth. That's 4 requests executing in parallel rather than in serial (as in the example code in the op) which would surely save significant time when scaled out to a bigger example.

Yes, that would be faster, but not significantly - a constant factor at most. You would still have to sequentially process all the leaves of the tree.

The code for this is trivial though - you'd just replace the fetch(node).then(processChildren) with a concurrent version:

return Promise.all([
    fetch(`http://numbersapi.com/${current.number}/`),
    current.children && current.children.length > 0
      ? sumNumbers(current.children)
      : 0
}).then(([result, childrenTotal]) => {
//        ^^^^^^
    var parentTotal = runningTotal + current.number;
    return childrenTotal + parentTotal;
});

I notice you never actually use the result, so I wonder why you're fetching anything at all :-)

Bergi
  • 630,263
  • 148
  • 957
  • 1,375