I'm just starting off with Node.js and struggling with some of the finer points of non-blocking (asynchronous?) code. I know there are lots of questions about blocking vs non-blocking code already, but after reading through some of them, I still couldn't sort out this issue.
As a learning exercise, I made a simple script that loads URLs from a file, queries them using the request
module, and notifies me if a URL is the New York Times homepage.
Here is a MWE:
// CSV Parse test
'use strict';
var request = require('request');
var fs = require('fs');
var parse = require('csv-parse');
var text = fs.readFileSync('input.txt','utf8');
var r_isnyt = /New York Times/;
var data = [];
parse(text, {skip_empty_lines: true}, function(err, data){
for (var r = 0; r < data.length; r++) {
console.log ('Logging from within parse function:');
console.log ('URL: '+data[r][0]+'\n');
var url = data[r][0];
request(url, function(error, response, body) {
console.log ('Logging from within request function:');
console.log('Loading URL: '+url+'\n');
if (!error && response.statusCode == 200) {
if (r_isnyt.exec(body)){
console.log('This is the NYT site! ');
}
console.log ('');
}
});
}
});
And here is my input.txt
:
http://www.nytimes.com/
www.google.com
From what I understood of non-blocking code, this program's flow would be:
parse(text, {skip_empty_lines: true}, function(err, data){
loads the data and returns the lines of the input file in a 2D array, which is complete and available right after that line.For
Loop iterates through it, loading URLs with the linerequest(url, function(error, response, body) {
, which is non-blocking (right?), so the For loop continues without waiting for the previous URL to finish loading.As a result, you could have multiple URLs being loaded at once, and the
console.log
calls withinrequest
will print in the order the responses are received, not the order of the input file.Within
request
, which has access to the results of the request tourl
, we print the URL, check whether it's the New York Times, and print the result of that check (all blocking steps I thought).
That's a long-winded way of getting around to my question. I just wanted to clarify that I thought I understood the basic concepts of non-blocking code. So what's baffling me is that my output is as follows:
>node parsecsv.js
Logging from within parse function:
URL: http://www.nytimes.com/
Logging from within parse function:
URL: www.google.com
Logging from within request function:
Loading URL: www.google.com
Logging from within request function:
Loading URL: www.google.com
This is the NYT site!
>
I understand why the request
printouts all happen together at the end, but why do they both print Google, and much more baffling, why does the last one say it's the NYT site, when the log line right before it (from within the same request
call) has just printed Google? It's like the request
calls are getting the correct URLs, but the console.log
calls are lagging, and just print everything at the end with the ending values.
Interestingly, if I reverse the order of the URLs, everything looks correct in the output, I guess because of differences in response times from the sites:
node parsecsv.js
Logging from within parse function:
URL: www.google.com
Logging from within request function:
Loading URL: www.google.com
Logging from within parse function:
URL: http://www.nytimes.com/
Logging from within request function:
Loading URL: http://www.nytimes.com/
This is the NYT site!
>
Thanks in advance.
Update
Based on the answer from jfriend00 below, I've changed my code to use a .forEach
loop instead as follows. This appears to fix the issue.
// CSV Parse test
'use strict';
var request = require('request');
var fs = require('fs');
var parse = require('csv-parse');
var text = fs.readFileSync('input.txt','utf8');
var r_isnyt = /New York Times/;
var data = [];
parse(text, {skip_empty_lines: true}, function(err, data){
data.forEach( function(row) {
console.log ('Logging from within parse function:');
console.log ('URL: '+row[0]+'\n');
let url = row[0];
request(url, function(error, response, body) {
console.log ('Logging from within request function:');
console.log('Loading URL: '+url+'\n');
if (!error && response.statusCode == 200) {
if (r_isnyt.exec(body)){
console.log('This is the NYT site! ');
}
console.log ('');
}
});
});
});