0

I'm having trouble understanding how Node operates regarding it's parallel processing and returning values from function calls.

FYI: The gulp function below is merely created as an example for this question.

Is it possible that the function could return the stream before the Read a large file statement has finished processing (the large file has been fully read from the file system and the stream has been added), or is Node smart enough to complete all statements before returning?

function moveFiles(){

    var gulp = require('gulp'),
        stream = require('merge-stream')();

    // Read a large file
    stream.add(gulp.src('src/large-file.txt')
        .pipe(gulp.dest('dest/'))
    );

    // Read a small file
    stream.add(gulp.src('src/small-file.txt')
        .pipe(gulp.dest('dest/'))
    );

    return (stream.isEmpty() ? null : stream);

}
I am me
  • 131
  • 2
  • 12
  • A stream is exactly that: a pointer to a not yet (or only partially) read file. If the function had already read the whole content, what good would a *stream* be? – Bergi Feb 07 '17 at 04:08
  • Welcome to the wonderful world of [asynchronous programming](https://en.wikipedia.org/wiki/Asynchrony_(computer_programming))! – Bergi Feb 07 '17 at 04:11
  • @Bergi Yes, the asynchronous aspect is confusing me. This question isn't related to gulp per se. Another example might be, using node: say the same function read two files from the file system (not using gulp or streams, just simple node `fs` calls), and the function should then return the contents of the two files concatinated together. Is it possible the return statement may fire before both the files have been read? – I am me Feb 07 '17 at 04:16
  • 1
    Yes, if you are using the asynchronous `fs` methods (not their synchronous counterparts), that's usually the case. And because they return immediately without having finished reading the files, your `return` statement will be met immediately, which is also the reason why your function cannot yet *return* the concatenated results. At best, it can return a structure that represents the future result. Such a structure can be a promise or a stream. With the help of this structure, you can tell node to do something (execute a *callback* function) when the results are available in the future. – Bergi Feb 07 '17 at 04:20
  • There are, and those who do. two types of javascript programmers, those who don't understand asynchronous code – Jaromanda X Feb 07 '17 at 04:30
  • @bergi asynchronous coding within Node is driving me crazy because I don't know for sure what statements Node decides should be completed before moving onto the next. – I am me Feb 07 '17 at 04:33
  • 1
    [Experience, and checking the docs](http://stackoverflow.com/q/21884258/1048572) :-) However, every *statement* is evaluated synchronously. It might start background tasks, but it always returns before completely unrelated things might happen (unlike multithreaded environments). In `a(); b();`, `a` will always return before `b` is evaluated. – Bergi Feb 07 '17 at 04:38

1 Answers1

0

Could Node feasibly return a value from a function call before completing all operations within the function itself?

This is a tricky question. The answer is no, in a way that returning a value means that the function is finished executing, it's taken back from the stack and it will never do anything again - unless it's invoked another time of course, but the point is that this particular invocation is over.

But the tricky part is that it's the function that's finished executing and it doesn't mean that it couldn't schedule something else to happen in the future. It will get more complicated in a minute but first a very simple example.

function x() {
    setTimeout(function () {
        console.log('x1'));
    }, 2000);
    console.log('x2');
    return;
    console.log('x3');
}

Here when you call x() then it will schedule another function to run after 2 seconds, then it will print x2 and then it will return - at which point this function cannot do anything else ever again for that invocation.

It means that x3 will never get printed, but x1 will eventually get printed - because it's another function that will be called when the timeout fires. The anonymous function will get called not because the x() function can do anything after it returns, but because it managed to schedule the timeout before it returned.

Now, instead of just scheduling things to happen in the future, a function can return a promise that will get resolved some time later. For example:

function y() {
    console.log('y1');
    return new Promise(function (resolve, reject) {
        setTimeout(function () {
            resolve('message from y()');
        }, 2000);
    });
    console.log('y2');
}

Now, when you run:

var promise = y();

what will happen is that y1 will get printed, a new promise will get returned and y2 will never get printed because at that point y() returned and cannot do anything else. But it managed to schedule a timeout that will resolve the promise after two seconds.

You can observe it with:

promise.then(function (value) {
    console.log(value);
});

So with this example you can see that while the y() function itself returned and cannot do anything else, some other (anonymous in this case) function can be called in the future and finish the job that the y() function has initiated.

So I hope now it's clear why it's a tricky question. In a way a function cannot do anything after returning. But it could have scheduled some other functions as timeouts, event handlers etc. that can do something after the functions returns. And if the thing that the function returns is a promise then the caller can easily observe the value in the future when it's ready.

All of the examples could be simplified by using the arrow functions but I wanted to make it explicit that those are all separate functions, some of them are named, some are anonymous.

For more details see some of those answers:

Community
  • 1
  • 1
rsp
  • 107,747
  • 29
  • 201
  • 177
  • Thanks for the comprehensive explanation. Really helps. So in my case, using the gulp example above, I would first need to understand which of the Node/Gulp environment API's / methods are in fact asynchronous / synchronous? So I'd need to ask if the following were synchronous or not: `stream.add()`, `gulp.src()` – I am me Feb 07 '17 at 11:46