I'm new to Node.js. I've been working my way through "Node.js the Right Way" by Jim R. Wilson and I'm running into a contradiction in the book (and in Node.js itself?) that I haven't been able to reconcile to my satisfaction with any amount of googling.
It's stated repetitively in the book and in other resources I have looked at online that Node.js runs callbacks in response to some event line-by-line until completion, then the event loop proceeds with waiting or invoking the next callback. And because Node.js is single-threaded (and short of explicitly doing anything with the cluster module, also runs as a single process), my understanding is that there is only ever, at most, one chunk of JavaScript code executing at a time.
Am I understanding that correctly? Here's the contradiction (in my mind). How is Node.js so highly concurrent if this is the case?
Here is an example straight from the book that illustrates my confusion. It is intended to walk a directory of many thousands of XML files and extract the relevant bits of each into a JSON document.
First the parser:
'use strict';
const
fs = require('fs'),
cheerio = require('cheerio');
module.exports = function(filename, callback) {
fs.readFile(filename, function(err, data){
if (err) { return callback(err); }
let
$ = cheerio.load(data.toString()),
collect = function(index, elem) {
return $(elem).text();
};
callback(null, {
_id: $('pgterms\\:ebook').attr('rdf:about').replace('ebooks/', ''),
title: $('dcterms\\:title').text(),
authors: $('pgterms\\:agent pgterms\\:name').map(collect),
subjects: $('[rdf\\:resource$="/LCSH"] ~ rdf\\:value').map(collect)
});
});
};
And the bit that walks the directory structure:
'use strict';
const
file = require('file'),
rdfParser = require('./lib/rdf-parser.js');
console.log('beginning directory walk');
file.walk(__dirname + '/cache', function(err, dirPath, dirs, files){
files.forEach(function(path){
rdfParser(path, function(err, doc) {
if (err) {
throw err;
} else {
console.log(doc);
}
});
});
});
If you run this code, you will get an error resulting from the fact that the program exhausts all available file descriptors. This would seem to indicate that the program has opened thousands of files concurrently.
My question is... how can this possibly be, unless the event model and/or concurrency model behave differently than how they have been explained?
I'm sure someone out there knows this and can shed light on it, but for the moment, color me very confused!