4

Lets assume I run this piece of code.

var score = 0;
for (var i = 0; i < arbitrary_length; i++) {
     async_task(i, function() { score++; }); // increment callback function
}

In theory I understand that this presents a data race and two threads trying to increment at the same time may result in a single increment, however, nodejs(and javascript) are known to be single threaded. Am I guaranteed that the final value of score will be equal to arbitrary_length?

bilalba
  • 783
  • 1
  • 7
  • 22

4 Answers4

6

Am I guaranteed that the final value of score will be equal to arbitrary_length?

Yes, as long as all async_task() calls call the callback once and only once, you are guaranteed that the final value of score will be equal to arbitrary_length.

It is the single-threaded nature of Javascript that guarantees that there are never two pieces of Javascript running at the exact same time. Instead, because of the event driven nature of Javascript in both browsers and node.js, one piece of JS runs to completion, then the next event is pulled from the event queue and that triggers a callback which will also run to completion.

There is no such thing as interrupt driven Javascript (where some callback might interrupt some other piece of Javascript that is currently running). Everything is serialized through the event queue. This is an enormous simplification and prevents a lot of stickly situations that would otherwise be a lot of work to program safely when you have either multiple threads running concurrently or interrupt driven code.

There still are some concurrency issues to be concerned about, but they have more to do with shared state that multiple asynchronous callbacks can all access. While only one will ever be accessing it at any given time, it is still possible that a piece of code that contains several asynchronous operations could leave some state in an "in between" state while it was in the middle of several async operations at a point where some other async operation could run and could attempt to access that data.

You can read more about the event driven nature of Javascript here: How does JavaScript handle AJAX responses in the background? and that answer also contains a number of other references.

And another similar answer that discusses the kind of shared data race conditions that are possible: Can this code cause a race condition in socket io?

Some other references:

how do I prevent event handlers to handle multiple events at once in javascript?

Do I need to be concerned with race conditions with asynchronous Javascript?

JavaScript - When exactly does the call stack become "empty"?

Node.js server with multiple concurrent requests, how does it work?


To give you an idea of the concurrency issues that can happen in Javascript (even without threads and without interrupts, here's an example from my own code.

I have a Raspberry Pi node.js server that controls the attic fans in my house. Every 10 seconds it checks two temperature probes, one inside the attic and one outside the house and decides how it should control the fans (via relays). It also records temperature data that can be presented in charts. Once an hour, it saves the latest temperature data that was collected in memory to some files for persistence in case of power outage or server crash. That saving operation involves a series of async file writes. Each one of those async writes yields control back to the system and then continues when the async callback is called signaling completion. Because this is a low memory system and the data can potentially occupy a significant portion of the available RAM, the data is not copied in memory before writing (that's simply not practical). So, I'm writing the live in-memory data to disk.

At any time during any of these async file I/O operations, while waiting for a callback to signify completion of the many file writes involved, one of my timers in the server could fire, I'd collect a new set of temperature data and that would attempt to modify the in-memory data set that I'm in the middle of writing. That's a concurrency issue waiting to happen. If it changes the data while I've written part of it and am waiting for that write to finish before writing the rest, then the data that gets written can easily end up corrupted because I will have written out one part of the data, the data will have gotten modified from underneath me and then I will attempt to write out more data without realizing it's been changed. That's a concurrency issue.

I actually have a console.log() statement that explicitly logs when this concurrency issue occurs on my server (and is handled safely by my code). It happens once every few days on my server. I know it's there and it's real.

There are many ways to work around those types of concurrency issues. The simplest would have been to just make a copy in memory of all the data and then write out the copy. Because there are not threads or interrupts, making a copy in memory would be safe from concurrency (there would be no yielding to async operations in the middle of the copy to create a concurrency issue). But, that wasn't practical in this case. So, I implemented a queue. Whenever I start writing, I set a flag on the object that manages the data. Then, anytime the system wants to add or modify data in the stored data while that flag is set, those changes just go into a queue. The actual data is not touched while that flag is set. When the data has been safely written to disk, the flag is reset and the queued items are processed. Any concurrency issue was safely avoided.


So, this is an example of concurrency issues that you do have to be concerned about. One great simplifying assumption with Javascript is that a piece of Javascript will run to completion without any thread of getting interrupted as long as it doesn't purposely return control back to the system. That makes handling concurrency issues like described above lots, lots easier because your code will never be interrupted except when you consciously yield control back to the system. This is why we don't need mutexes and semaphores and other things like that in our own Javascript. We can use simple flags (just a regular Javascript variable) like I described above if needed.


In any entirely synchronous piece of Javascript, you will never be interrupted by other Javascript. A synchronous piece of Javascript will run to completion before the next event in the event queue is processed. This is what is meant by Javascript being an "event-driven" language. As an example of this, if you had this code:

 console.log("A");
 // schedule timer for 500 ms from now
 setTimeout(function() {
     console.log("B");
 }, 500);

 console.log("C");

 // spin for 1000ms
 var start = Date.now();
 while(Data.now() - start < 1000) {}

 console.log("D");

You would get the following in the console:

A
C
D
B

The timer event cannot be processed until the current piece of Javascript runs to completion, even though it was likely added to the event queue sooner than that. The way the JS interpreter works is that it runs the current JS until it returns control back to the system and then (and only then), it fetches the next event from the event queue and calls the callback associated with that event.

Here's the sequence of events under the covers.

  1. This JS starts running.
  2. console.log("A") is output.
  3. A timer event is schedule for 500ms from now. The timer subsystem uses native code.
  4. console.log("C") is output.
  5. The code enters the spin loop.
  6. At some point in time part-way through the spin loop the previously set timer is ready to fire. It is up to the interpreter implementation to decide exactly how this works, but the end result is that a timer event is inserted into the Javascript event queue.
  7. The spin loop finishes.
  8. console.log("D") is output.
  9. This piece of Javascript finishes and returns control back to the system.
  10. The Javascript interpreter sees that the current piece of Javascript is done so it checks the event queue to see if there are any pending events waiting to run. It finds the timer event and a callback associated with that event and calls that callback (starting a new block of JS execution). That code starts running and console.log("B") is output.
  11. That setTimeout() callback finishes execution and the interpreter again checks the event queue to see if there are any other events that are ready to run.
Community
  • 1
  • 1
jfriend00
  • 683,504
  • 96
  • 985
  • 979
  • Added a number of references. – jfriend00 Sep 06 '16 at 00:34
  • This is a very insightful answer, thank you very much! – bilalba Sep 06 '16 at 01:46
  • "There still are some race conditions to be concerned about, but they have more to do with shared state that multiple asynchronous callbacks can all access." <-- I wouldn't call this a race condition, but rather ordering your async callbacks in a way that they will be executed sequentially (by using a `Promise.then()` chain, for example). It's like if you had 3 functions, one to check that a file exists, one to open it, and one to access its contents. It wouldn't be considered a race condition if you ran the last one first before the second one. – Daniel T. Feb 14 '17 at 03:20
  • @DanielT. - The issue exists (multiple async operations that can access shared state). Your suggestion to serialize all the operations with promises is one possible way to handle that, but there are many others - sometimes you don't want/need to serialize operations, but would just rather design your shared state to be safe to access from several different async handlers in an unpredictable order. – jfriend00 Feb 14 '17 at 04:07
  • @jfriend00 My point is that there are no race conditions in Javascript. Race conditions can only occur if two simultaneously running threads access a shared resource at the same time, which is impossible in Javascript. – Daniel T. Feb 14 '17 at 04:11
  • @DanielT. - You're just arguing about the use of the term "race condition", I guess. I won't argue with you about that. The point is that shared state that can be accessed by multiple async callbacks that aren't purposely ordered has to be done carefully in order to not lead to problems. I refer to that as a potential race condition. If you have a different term you'd rather use, that's your prerogative. – jfriend00 Feb 14 '17 at 04:38
  • @jfriend00 A race condition specifically applies to multi-threading, which Javascript is not. It's not just my prerogative, misusing a common term that applies across multiple languages to mean something else only causes confusion. You wouldn't call a variable a function, so why would you call an ordering issue a race condition? – Daniel T. Feb 14 '17 at 04:56
  • @DanielT. - Apparently, you just like to argue. A race condition does not require threads to occur. Threads is one way it can be caused - there are others (such as interrupts or change of flow due to async operations). I will not respond to you further on this topic. – jfriend00 Feb 14 '17 at 05:03
  • @jfriend00 Please have a read on the Javascript event loop, specifically the "run-to-completion" section: https://developer.mozilla.org/en-US/docs/Web/JavaScript/EventLoop which specifically states that a function cannot be pre-empted while it's running (thus no interrupts or flow changes). I'll take your silence as an admittance that you finally understand why JS has no race conditions ;-). – Daniel T. Feb 14 '17 at 05:18
  • 1
    @DanielT. - I fully understand the run to completion and I've posted many well accepted answers on that topic myself. Just because there are no threads and no interrupts does not mean you cannot have concurrency issues with shared state. You can anytime your code calls an async operation and then continues in the callback because your code path will essentially return control back to the event loop at that point and other events can run while your last piece of logic was still mid-stream (waiting for an async operation to complete). That creates potential concurrency issues. – jfriend00 Feb 14 '17 at 05:23
  • @DanielT. - See the concurrency example from my own code that I added to the end of my answer. – jfriend00 Feb 14 '17 at 05:34
  • @jfriend00 Ah ok, I see what's going on. I read your answer in more detail and I see that you're using NodeJS's File I/O API, which does have race conditions because it uses the native OS APIs and does not acquire a file lock. It even mentions this in the docs: https://nodejs.org/api/fs.html As for the example you provided at the end of your answer, it's completely correct; callbacks do not fire until the current function scope has ended. As you mentioned, the `while()` loop you added prevents the function scope from ending, therefore the `setTimeout` callback is not fired until afterwards. – Daniel T. Feb 14 '17 at 06:04
  • @DanielT. - The concurrency issues I describe in my node.js server have nothing to do with file locking. They are a contention between async file I/O code in the middle of writing out in-memory data and other timer-driven code that wants to modify that in-memory data and will get to run and potentially modify the in-memory data while the async file I/O code is waiting for async callbacks. The concurrency issues are with the in-memory data, not with the file itself. File locking would not help here - this is not a file locking issue. – jfriend00 Feb 14 '17 at 06:30
  • @DanielT. - The same issue could happen if I was sending the data to a remote server via several http calls (which would likely not even involve any system threads at all). The issue is the usual design of async I/O code in Javascript returns control back to the system while waiting for an async callback and that is an opportunity for other code to run. If that other code could modify shared data that you were in the middle of processing, it creates potential concurrency issues. This is massively simpler to protect against that multiple threads in Java or C++, but still has to be dealt with. – jfriend00 Feb 14 '17 at 06:33
  • @jfriend00 I get what you mean. This is due to how Javascript invokes callback functions. The callback function uses the shared variable's value at the time that it's invoked, not when it's registered. Therefore, it's possible to register a function like `setTimeout(() => { console.log(counter); }, 5000)`, change the value of `counter` before `setTimeout` fires, and have it use the new value. However, it's impossible to change the value of `counter` while the callback function is running. – Daniel T. Feb 14 '17 at 06:54
  • In order to save the shared variable's value at the time that a callback function is registered rather than executed, you need to create a closure: `var closure = (() => var localCounter = counter; return () => { console.log(localCounter); })(); setTimeout(closure, 5000);` This way, the anonymous function is executed immediately, 'saving' the value of the counter while returning another function that can be executed as the callback function at a later time. – Daniel T. Feb 14 '17 at 07:10
  • @DanielT. - Yes, as I said in my answer, saving a copy of the shared data before starting the async operation is one way to work-around the conflicting access. In your example, it was a simple counter which is easy to share. In my case, it was tens of MB of data which was not as simple to just copy so I used a different scheme to protect it from interrupted access. – jfriend00 Feb 14 '17 at 07:25
2

Node uses an event loop. You can think of this as a queue. So we can assume, that your for loop puts the function() { score++; } callback arbitrary_length times on this queue. After that the js engine runs these one by one and increase score each time. So yes. The only exception if a callback is not called or the score variable is accessed from somewhere else.

Actually you can use this pattern to do tasks parallel, collect the results and call a single callback when every task is done.

var results = [];
for (var i = 0; i < arbitrary_length; i++) {
     async_task(i, function(result) {
          results.push(result);
          if (results.length == arbitrary_length)
               tasksDone(results);
     });
}
inf3rno
  • 24,976
  • 11
  • 115
  • 197
  • "call a single callback when every task is done" How can I know all tasks have been done. I can only think of busy waiting until score != arbitrary_length – bilalba Sep 06 '16 at 00:08
  • @bilalba Yes that's all you can do. Actually this code does the exact same with the results array length. :-) It's better btw. to add some kind of error handler for the rejected tasks, otherwise you will wait for eternity when an error occurs. – inf3rno Sep 06 '16 at 00:28
  • 1
    "How can I know all tasks have been done" <-- This is the exact reason why `Promise.all()` was implemented. https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Promise/all – Daniel T. Feb 14 '17 at 03:15
  • @DanielT. Yes, I know that. :-) Thanks. – inf3rno Feb 14 '17 at 06:34
1

No two invocations of the function can happen at the same time (b/c node is single threaded) so that will not be a problem. The only problem would be ifin some cases async_task(..) drops the callback. But if, e.g., 'async_task(..)' was just calling setTimeout(..) with the given function, then yes, each call will execute, they will never collide with each other, and 'score' will have the value expected, 'arbitrary_length', at the end.

Of course, the 'arbitrary_length' can't be so great as to exhaust memory, or overflow whatever collection is holding these callbacks. There is no threading issue however.

0

I do think it’s worth noting for others that view this, you have a common mistake in your code. For the variable i you either need to use let or reassign to another variable before passing it into the async_task(). The current implementation will result in each function getting the last value of i.

James Grunewald
  • 43
  • 1
  • 1
  • 4