0

I have an API endpoint in node that uses a simple query to retrieve about 1.8MB of data from mongodb (size shown in mongodb Compass). The query uses an index and the explain plan shows the query execution time is almost 0 ms. But, when I console log the time to retrieve the data in node, it takes about 200 ms. What is happening between mongodb finding the results and having the results available in node? Is there a way to speed this up? The mongodb server is on the same server as the node app. Here is how I'm timing the retrieval from mongodb in my node/express route:

app.get('/myRoute', function(req, res) {

    //get mongodb connection client
    const db = app.locals.db

    //start timer
    let start = new Date()

    //query DB
    db.collection("my_collection").find({year: 2019}).toArray()
    .then((data)=> {

        //log timer shows about 200 ms
        console.log(new Date() - start)

        res.writeHead(200, {'Content-Type': 'application/json'});
        res.end(JSON.stringify(data))
    })
})

My guess is since the amount of data is of decent size that it takes a while to convert from BSON to a JS object. If so, anyway to speed this up? I send this to the client as JSON but preferably as gzip JSON (to speed up the download).

The data are for reporting purposes but it just seems inefficient to query mongodb -> mongodb then converts to JS object -> stringify to JSON -> potentially gzip the JSON -> then send to client.

In production I use a file system with many files and folders with pre gzipped JSON that the API finds and sends to client. This is very fast but just hard to maintain. Any thoughts? Thanks!

moki
  • 110
  • 3
  • 13

2 Answers2

1

I guess you will always have the overhead if you will compare the time it takes to run query on mongo vs via node. You can try streaming the data instead of loading data first, stringifying it and then sending it over. Something like this might help.

app.get('/myRoute', function(req, res) {

  //get mongodb connection client
  const db = app.locals.db

  //start timer
  console.time("start");

  //query DB
  const query = db.collection("my_collection").find({year: 2019});
  query.stream().on("data", function (d) {
    res.json(d);
  });
  query.stream().on("end", function () {
    console.log("done"); res.end();
    console.timeEnd("start");
  });
})

Ashish Modi
  • 7,529
  • 2
  • 20
  • 35
  • I've tried this out and it helps in part in that it starts sending data to the client immediately. I will have to rewrite my client JS to consume the stream for a full test. I need to research how to do this but if you have any leads please let me know. Also, I didn't know you could log time like that. Thank you! – moki Mar 01 '20 at 19:31
  • 1
    I parse and store the data in a JS array which is used in an interactive data visualization. I could see your suggestion possibly working by streaming data into the array because the interaction could use most recent data first and by the time the user is able to change the visualization to use the last loaded data those data are loaded. – moki Mar 01 '20 at 19:58
0

The bson-ext module is an alternative BSON parser that is written in C++. It delivers better deserialization performance and similar or somewhat better serialization performance to the pure javascript parser.

https://mongodb.github.io/node-mongodb-native/3.4/installation-guide/installation-guide/#troubleshooting

Try to install bson-ext and check again.

Benchmarking JSON vs BSON

Valijon
  • 12,667
  • 4
  • 34
  • 67
  • I'm saying the explain plan is telling me 0ms (both in Compass and in node's mongo client using db.collection().find().explain()). I'm pretty convinced that the extra time is the parsing of BSON which is probably happening in the toArray() function. Yes, mongodb automatically caches the data but that doesn't help with parsing BSON. – moki Mar 01 '20 at 21:52
  • @moki Check again my answer please. Seems you are in the right direction about BSON serialization – Valijon Mar 01 '20 at 22:31