0

I am parsing a CSV file, for each row I want to check if corresponding entry exists in the database, and if it does I want to update it, if it doesn't I want to enter a new entry.

It is very slow - only around 30 entries per second.

Am I doing something incorrectly?

Using node, mongodb, monk

 function loadShopsCSV(ShopsName) {
    var filename = 'test.csv'

        csv
            .fromPath(filename)
            .on("data", function(data) {               
                           var entry = {
                                PeriodEST: Date.parse(data[0]),
                                TextDate: textDateM,
                                ShopId: parseInt(data[1]),
                                ShopName: data[2],
                                State: data[3],
                                AreaUS: parseInt(data[4]),
                                AreaUSX: AreaUSArray[stateArray.indexOf(data[3])],
                                ProductClass: data[5],
                                Type: data[6],
                                SumNetVolume: parseInt(data[7]),                                
                                Weekday: weekdayNum,                              
                                WeightedAvgPrice: parseFloat(data[8]),

                            }


                            db.get(ShopsDBname).update(
                                {"PeriodEST" : entry.PeriodEST,
                                 "ShopName": entry.ShopName,
                                 "State" : entry.State,
                                 "AreaUS" : entry.AreaUS,
                                 "ProductClass" : entry.ProductClass,
                                 "Type" : entry.Type},
                                  {$set : entry},
                                  function(err, result) {

                                  }
                            );
                    }
                }
            })
            .on("end", function() {
                console.log('finished loading: '+ShopsName)
            });
    }, function(err) {
        console.error(err);
    });
}
LucasSeveryn
  • 5,984
  • 8
  • 38
  • 65

2 Answers2

1

First I would suggest to localize problem:

  • replace .on("data", function(data) with dummy .on("data", function() {return;}) and confirm speed of csv parsing.
  • turn on mongo profiler db.setProfilingLevel(1) and check slow log if there is any query slower than 100 ms.

If there are no problems above - the bottleneck is in one of nodejs libraries you are using to prepare and send query.

Assuming the problem is with slow mongodb queries, you can use explain for the update query for details. It may be the case it does not use any indexes and run a table scan for every update.

Finally, it is recommended to use bulk operations, which was designed for exactly your usecase.

Alex Blex
  • 34,704
  • 7
  • 48
  • 75
  • all my update queries take more than 100ms - I can see it in the mongo console. I couldn't find any example where you update element by element via the bulk way... – LucasSeveryn May 03 '16 at 13:02
  • Unfortunately monk does not support bulk API yet: https://github.com/Automattic/monk/issues/85 – Alex Blex May 03 '16 at 13:24
  • Quick search returns an example for sails: http://stackoverflow.com/questions/32019267/how-to-properly-do-a-bulk-upsert-update-in-mongodb – Alex Blex May 03 '16 at 13:25
  • If changing library is not an option, you need to analyse your database performance. The first step is to update question with result of `db.YourCollection.explain().update(....typical update query...)` – Alex Blex May 03 '16 at 13:30
  • I have no idea what it does, or what it means, but after adding an index on the fields I query by doing `db.shops.createIndex(("PeriodEST":1 .. ` it sped up dramatically – LucasSeveryn May 04 '16 at 07:35
  • Apparently there were no indexes, so every update did full table scan. With correct indexes it picks matching documents from the index. – Alex Blex May 04 '16 at 08:24
0

Have you tried updating with no write concern? as MongoDB blocks until whole update is successful and DB sends back that acknowledgement? Are you on cluster or something? (might want to write into primary node if so)

after your {$set : entry}, {writeConcern: {w: 0}}

KaSh
  • 175
  • 1
  • 11
  • I think monk driver only takes two arguments, in this case, `users.update({}, {}, fn);` so I have nowhere to pass the writeconcern argument. It's ignored if I add an extra argument before fn. – LucasSeveryn May 03 '16 at 15:48