2

I've mostly learned coding with OOPs like Java.

I have a personal project where I want to import a bunch of plaintext into a mongodb. I thought I'd try to expand my horizons and do this with using node.js powered JavaScript.

I got the code working fine but I'm trying to figure out why it is executing the way it is.

The output from the console is: 1. done reading file 2. closing db 3. record inserted (n times)

var fs = require('fs'),
    readline = require('readline'),
    instream = fs.createReadStream(config.file),
    outstream = new (require('stream'))(),
    rl = readline.createInterface(instream, outstream); 

rl.on('line', function (line) {

    var split = line.split(" ");

    _user = "@" + split[0];
    _text = "'" + split[1] + "'";
    _addedBy = config._addedBy;
    _dateAdded = new Date().toISOString();

    quoteObj = { user : _user , text : _text , addedby : _addedBy, dateadded : _dateAdded};

    db.collection("quotes").insertOne(quoteObj, function(err, res) {
    if (err) throw err;
        console.log("record inserted.");
    });
});  
rl.on('close', function (line) {
    console.log('done reading file.');
    console.log('closing db.')
    db.close();
});

(full code is here: https://github.com/HansHovanitz/Import-Stuff/blob/master/importStuff.js)

When I run it I get the message 'done reading file' and 'closing db' and then all of the 'record inserted' messages. Why is that happening? Is it because of the delay in inserting a record in the db? The fact that I see 'closing db' first makes me think that the db would be getting closed and then how are the records being inserted still?

Just curious to know why the program is executing in this order for my own peace of mind. Thanks for any insight!

SuperCow
  • 1,523
  • 7
  • 20
  • 32

5 Answers5

3

In short, it's because of asynchronous nature of I/O operations in the used functions - which is quite common for Node.js.

Here's what happens. First, the script reads all the lines of the file, and for each line initiates db.insertOne() operation, supplying a callback for each of them. Note that the callback will be called when the corresponding operation is finished, not in the middle of this process.

Eventually the script reaches the end of the input file, logs two messages, then invokes db.close() line. Note that even though 'insert' callbacks (that log 'inserted' message) are not called yet, the database interface has already received all the 'insert' commands.

Now the tricky part: whether or not DB interface succeeds to store all the DB records (in other words, whether or not it'll wait until all the insert operations are completed before closing the connection) is up both to DB interface and its speed. If write op is fast enough (faster than reading the file line), you'll probably end up with all the records been inserted; if not, you can miss some of them. That's why it's a safest bet to close the connection to database not in the file close (when the reading is complete), but in insert callbacks (when the writing is complete):

let linesCount = 0;
let eofReached = false;
rl.on('line', function (line) {
  ++linesCount;

  // parsing skipped for brevity
  db.collection("quotes").insertOne(quoteObj, function(err, res) {
    --linesCount;
    if (linesCount === 0 && eofReached) { 
      db.close();
      console.log('database close');
    }
    // the rest skipped
  });
});  
rl.on('close', function() {
    console.log('reading complete');
    eofReached = true;
});

This question describes the similar problem - and several different approaches to solve it.

raina77ow
  • 103,633
  • 15
  • 192
  • 229
1

Welcome to the world of asynchronicity. Inserting into the DB happens asynchronously. This means that the rest of your (synchronous) code will execute completely before this task is complete. Consider the simplest asynchronous JS function setTimeout. It takes two arguments, a function and a time (in ms) after which to execute the function. In the example below "hello!" will log before "set timeout executed" is logged, even though the time is set to 0. Crazy right? That's because setTimeout is asynchronous.

This is one of the fundamental concepts of JS and it's going to come up all the time, so watch out!

setTimeout(() => {
  console.log("set timeout executed") 
}, 0) 

console.log("hello!")
1

When you call db.collection("quotes").insertOne you're actually creating an asynchronous request to the database, a good way to determine if a code will be asynchronous or not is if one (or more) of its parameters is a callback.

So the order you're running it is actually expected:

  1. You instantiate rl
  2. You bind your event handlers to rl
  3. Your stream starts processing & calling your 'line' handler
  4. Your 'line' handler opens asynchronous requests
  5. Your stream ends and rl closes

    ...

4.5. Your asynchronous requests return and execute their callbacks

I labelled the callback execution as 4.5 because technically your requests can return at anytime after step 4.

I hope this is a useful explanation, most modern javascript relies heavily on asynchronous events and it can be a little tricky to figure out how to work with them!

Patrick Barr
  • 1,123
  • 7
  • 17
1

You're on the right track. The key is that the database calls are asychronous. As the file is being read, it starts a bunch of async calls to the database. Since they are asynchronous, the program doesn't wait for them to complete at the time they are called. The file then closes. As the async calls complete, your callbacks runs and the console.logs execute.

Steve
  • 132
  • 1
  • 5
1

Your code reads lines and immediately after that makes a call to the db - both asynchronous processes. When the last line is read the last request to the db is made and it takes some time for this request to be processed and the callback of the insertOne to be executed. Meanwhile the r1 has done it's job and triggers the close event.

Slim
  • 1,924
  • 1
  • 11
  • 20