15

Which part of syntax provides the information that this function should run in other thread and be non-blocking?

Let's consider simple asynchronous I/O in node.js

 var fs = require('fs');
 var path = process.argv[2];

  fs.readFile(path, 'utf8', function(err,data) {
   var lines = data.split('\n');
   console.log(lines.length-1);
  });

What exactly makes the trick that it happens in background? Could anyone explain it precisely or paste a link to some good resource? Everywhere I looked there is plenty of info about what callback is, but nobody explains why it actually works like that.

This is not the specific question about node.js, it's about general concept of callback in each programming language.

EDIT:

Probably the example I provided is not best here. So let's do not consider this node.js code snippet. I'm asking generally - what makes the trick that program keeps executing when encounter callback function. What is in syntax that makes callback concept a non-blocking one?

Thanks in advance!

theDC
  • 6,364
  • 10
  • 56
  • 98
  • You already have answers, but you can read that too [javascript callback : functions are objects](http://recurial.com/programming/understanding-callback-functions-in-javascript/) – MacKentoch Apr 26 '15 at 21:15
  • There's no way to generalize this. Callbacks run when they're called, and that depends entirely on what you're passing the callback to, regardless of the language. – Dave Newton Apr 26 '15 at 21:15
  • nothing in the code you posted has anything to do with async processing. In fact, JavaScript is not a multi-threaded language, so you can never actually "trigger something to happen in the background". you can, however, **simulate** async behavior by using some third party library like `Q` that easily creates promise chains. Some core node functions may operate on a different thread, but you do not have access to them and cannot control their behavior. – Claies Apr 26 '15 at 21:19
  • 1
    Nothing in JavaScript's syntax defines sync v. async. – Dave Newton Apr 26 '15 at 21:27
  • Maybe another way to ask this question would be... If I wanted to write a simple JavaScript function which accepted another function as an argument and invoked that function without blocking the caller, how would I write it? – David Apr 26 '15 at 21:41
  • you wouldn't, since JavaScript is not asynchronous. – Claies Apr 26 '15 at 21:52
  • you could, however, employ one of the many promise libraries available for JavaScript which can ***simulate*** non-blocking operations. – Claies Apr 26 '15 at 21:54
  • based on all your responses in the comments, it's clear that you are asking about a general concept, which isn't even present in the language you tagged or related at all to the code you posted. – Claies Apr 26 '15 at 21:56
  • related, if not duplicate: [Are all javascript callbacks asynchronous? If not, how do I know which are?](http://stackoverflow.com/q/19083357/1048572) – Bergi Apr 26 '15 at 21:57
  • Do you recommend to close the thread and create new one about mere callback concept? This might be a good idea – theDC Apr 26 '15 at 21:58
  • @Claies: Promise libraries don't simulate anything ([they're just callbacks](http://stackoverflow.com/a/22562045/1048572)), they do need some asynchronous primitives themselves to make operations non-blocking. – Bergi Apr 26 '15 at 21:59
  • @DCDC It's not obvious what you're asking that hasn't already been answered. Callbacks and asynch/synch are orthogonal. Neither implies either. – Dave Newton Apr 26 '15 at 22:00
  • @Bergi I may have worded it wrong, but my point is that promises, callbacks, etc. are never truly async in JavaScript unless they access platform specific features, which are related to what the function does, not how it's written.... it's all faking it in the language tagged on the question. – Claies Apr 26 '15 at 22:05

3 Answers3

28

There is nothing in the syntax that tells you your callback is executed asynchronously. Callbacks can be asynchronous, such as:

setTimeout(function(){
    console.log("this is async");
}, 100);

or it can be synchronous, such as:

an_array.forEach(function(x){
    console.log("this is sync");
});

So, how can you know if a function will invoke the callback synchronously or asynchronously? The only reliable way is to read the documentation.

You can also write a test to find out if documentation is not available:

var t = "this is async";
some_function(function(){
    t = "this is sync";
});

console.log(t);

How asynchronous code work

Javascript, per se, doesn't have any feature to make functions asynchronous. If you want to write an asynchronous function you have two options:

  1. Use another asynchronous function such as setTimeout or web workers to execute your logic.

  2. Write it in C.

As for how the C coded functions (such as setTimeout) implement asynchronous execution? It all has to do with the event loop (or mostly).

The Event Loop

Inside the web browser there is this piece of code that is used for networking. Originally, the networking code could only download one thing: the HTML page itself. When Mosaic invented the <img> tag the networking code evolved to download multiple resources. Then Netscape implemented progressive rendering of images, they had to make the networking code asynchronous so that they can draw the page before all images are loaded and update each image progressively and individually. This is the origin of the event loop.

In the heart of the browser there is an event loop that evolved from asynchronous networking code. So it's not surprising that it uses an I/O primitive as its core: select() (or something similar such as poll, epoll etc. depending on OS).

The select() function in C allows you to wait for multiple I/O operations in a single thread without needing to spawn additional threads. select() looks something like:

select (max, readlist, writelist, errlist, timeout)

To have it wait for an I/O (from a socket or disk) you'd add the file descriptor to the readlist and it will return when there is data available on any of your I/O channels. Once it returns you can continue processing the data.

The javascript interpreter saves your callback and then calls the select() function. When select() returns the interpreter figures out which callback is associated with which I/O channel and then calls it.

Conveniently, select() also allows you to specify a timeout value. By carefully managing the timeout passed to select() you can cause callbacks to be called at some time in the future. This is how setTimeout and setInterval are implemented. The interpreter keeps a list of all timeouts and calculates what it needs to pass as timeout to select(). Then when select() returns in addition to finding out if there are any callbacks that needs to be called due to an I/O operation the interpreter also checks for any expired timeouts that needs to be called.

So select() alone covers almost all the functionality necessary to implement asynchronous functions. But modern browsers also have web workers. In the case of web workers the browser spawns threads to execute javascript code asynchronously. To communicate back to the main thread the workers must still interact with the event loop (the select() function).

Node.js also spawns threads when dealing with file/disk I/O. When the I/O operation completes it communicates back with the main event loop to cause the appropriate callbacks to execute.


Hopefully this answers your question. I've always wanted to write this answer but was to busy to do so previously. If you want to know more about non-blocking I/O programming in C I suggest you take a read this: http://www.gnu.org/software/libc/manual/html_node/Waiting-for-I_002fO.html

For more information see also:

Cœur
  • 37,241
  • 25
  • 195
  • 267
slebetman
  • 109,858
  • 19
  • 140
  • 171
  • Note that node.js uses libev as its event handling library. libev will select (no pun intended) the appropriate method (select, poll, epoll, overlapped I/O) depending on OS at compile time. – slebetman Apr 27 '15 at 01:43
  • 3
    This is a fantastic answer. – Euroclydon37 Jul 05 '17 at 04:33
  • 1
    @szeb libuv internally will choose the appropriate async I/O library at compile time so it is still `select()` or something similar such as poll, epoll etc. depending on OS so there is no update. Node was using solely libevent when I wrote this answer but that does not change this answer at all since libevent and now libuv still do 100% what this answer is explaining. This is a forever correct answer – slebetman Nov 02 '20 at 14:01
  • 1
    @szeb If you want a much deeper explaination of how OSes provide services such as `select`, `poll`, `epoll`, Overlapped I/O (Windows) etc. that libuv uses check out my much lower level answer to this related question: https://stackoverflow.com/questions/61262054/is-there-any-other-way-to-implement-a-listening-function-without-an-infinite-w/61826079#61826079 – slebetman Nov 02 '20 at 14:04
  • @slebetman My comment mainly concerned your first comment, I wanted to highlight that readers should google libuv if they are interested. Anyway, thank you for your answer and the additional link you provided – szeb Nov 02 '20 at 14:05
  • 1
    **Progressive rendering** means -> https://i.stack.imgur.com/BtI3d.gif – Soner from The Ottoman Empire Jan 31 '21 at 19:37
  • Where Nestcape uses `select` function is -> https://github.com/zii/netscape/blob/c19643a913be46222ffb61557b2665bceeb1b956/nsprpub/pr/src/md/windows/w95sock.c#L106 – Soner from The Ottoman Empire Jan 31 '21 at 19:58
  • Thank you for your wonderful answers. I also checked your another answer. But still my curiosity and question about how these things work at a very low level is unresolved. You referred to `select` and other functions as to how Asynchronous I/O completion and the applications interact. But I'd like to know what is happening under the hood(under the such functions). – Smart Humanism Sep 03 '22 at 05:51
  • When I/O completes, how does the OS make the callback be processed? Does the OS, when interrupted, directly inform the event loop thread, e.g. by adding event notification object(data) to some queue in the user space memory or, does the OS store the I/O completion related data in an event-related queue in kernel space memory and the event loop has to check(poll) on it regularly by using system calls? – Smart Humanism Sep 03 '22 at 05:56
  • Or for the last scenario I thought of, does the OS kernel force the current event loop stop temporaily and OS itself finds and calls a certain piece of code in the program, not kernel and execute it, just like an exception handling routine set by the event loop code? – Smart Humanism Sep 03 '22 at 05:59
  • 1
    @SmartHumanism Interrupts. It's a hardware feature of your CPU. An interrupt is like a function call but instead of being triggered by software it is triggered by hardware (a signal wire changes voltage). Wikipedia has a fairly good article on interrupts: https://en.wikipedia.org/wiki/Interrupt. In my opinion the best way to understand interrupts is to try designing your own CPU. The basic concept of interrupts is very simple: it is simply a hardcoded function call. The CPU is simply **DESIGNED** to load a hardcoded memory address when a given interrupt happens.. – slebetman Sep 04 '22 at 13:30
  • Thank you for your reply, but I know what interrupts are. What I asked is how the designated callback function is run when interrupt happens, or how kernel notifies the event loop under the hood. – Smart Humanism Sep 04 '22 at 13:36
  • @SmartHumanism x86 and ARM have fairly complex interrupts so lets look at a much simpler CPU - the PIC microcontroller (You'll often find them in SIM cards and ATM cards). On a PIC an interrupt always causes a function call to the address 0x0004. So when any hardware event happens (byte received from network, button pressed, timer expires etc.) the CPU will call the function at 0x0004 - what calling a function means is basically the CPU will load the number 4 into the program counter and so the next instruction the CPU will fetch will be located at the address 0x0004... – slebetman Sep 04 '22 at 13:41
  • @SmartHumanism When you develop an OS fot the PIC CPU you will need to write a function located at the address 0x0004 to handle the interrupts. Your interrupt handler will need to figure out what happened and what you need to do because that thing happens. After executing your interrupt handler you will need to return from interrupt (it just so happens that the PIC has a dedicated instruction for this that is different from the normal return instruction). When you return from the interrupt whatever the CPU was executing previously will continue to execute. – slebetman Sep 04 '22 at 13:42
  • @SmartHumanism Look at the prototype of `select()`. There is no such thing as `notify the event loop`. That is just a human concept. The OS unpauses the process (wake it up from sleep) and execute it. The return value of the `select` will tell the process which event causes the process to unfreeze and continue executing. In the case of `select()` this is a bitfield. So if the value returned is 8 it means the 4th file descriptor was what caused the select function call to return so your code should try to read data from the 4th file descriptor (8 = 0x08 = 0b00001000). That may either be... – slebetman Sep 04 '22 at 13:46
  • ... a file or a socket. Note that this is how `select()` work. There are other APIs that work differently. For example `epoll()` returns a data structure (I forget what but you can check out the documentation) and BSD's (Mac OS) `kqueue()` returns a different kind of data structure and Windows overlapped I/O does something different. – slebetman Sep 04 '22 at 13:49
  • The callback function is handled by node.js/browser/js engine. If you've read how `select()` works and know how C/C++ works it is fairly obvious how you can design a system that saves callback functions in a list or array or hash table and search for it when it's associated event is returned from `select()` or `epoll()` or `kqueue()` etc. Do you know how to search for data in a linked list? – slebetman Sep 04 '22 at 13:52
0

A callback is not necessarily asynchronous. Execution depends entirely on how fs.readFile decides to treat the function parameter.

In JavaScript, you can execute a function asynchronously using for example setTimeout.

Discussion and resources:

How does node.js implement non-blocking I/O?

Concurrency model and Event Loop

Wikipedia:

There are two types of callbacks, differing in how they control data flow at runtime: blocking callbacks (also known as synchronous callbacks or just callbacks) and deferred callbacks (also known as asynchronous callbacks).

Community
  • 1
  • 1
ekuusela
  • 5,034
  • 1
  • 25
  • 43
  • ok, so maybe this is a bad example, but generally passing block of code runs asynchronously, doesn't it? – theDC Apr 26 '15 at 21:10
  • @DCDC There's no way to generalize that. Callbacks run when they're called, and that depends entirely on what you're passing the callback to. – Dave Newton Apr 26 '15 at 21:15
  • Passing a function does nothing. Something must execute that function and that can be done any way the developer wants. Example: http://jsfiddle.net/pfudnyh6/3/ – ekuusela Apr 26 '15 at 21:16
0

First of all, if something is not Async, it means it's blocking. So the javascript runner stops on that line until that function is over (that's what a readFileSync would do).

As we all know, fs is a IO library, so that kind of things take time (tell the hardware to read some files is not something done right away), so it makes a lot of sense that anything that does not require only the CPU, it's async, because it takes time, and does not need to freeze the rest of the code for waiting another piece of hardware (while the CPU is idle).

I hope this solves your doubts.

Juan Pablo
  • 363
  • 2
  • 9
  • thank you but I'm asking about something else. Which part of syntax, generally - maybe node.js is the bad example here, tells "do it in background" – theDC Apr 26 '15 at 21:15
  • 1
    @DCDC There's nothing in JavaScript syntax that defines this. – Dave Newton Apr 26 '15 at 21:17
  • Oh, I thought you asked why. The callback argument is because that's how the API is, and makes sense because what I've just said early. If you want to understand how is that async, you could read about multi-threading, or how the javascript engine supports async code by implementing the event loop https://developer.mozilla.org/en-US/docs/Web/JavaScript/EventLoop but if you'd like to get really into the question, it'll be C++ code, not JS – Juan Pablo Apr 26 '15 at 21:20
  • could you explain using C++ then? Thanks! – theDC Apr 26 '15 at 21:44