0

I have a collection in a mongodb. The documents contain an array. I want to know how many element exist in all arrays in all documents. There are a couple of ways of doing this (covered in this question: MongoDB: count the number of items in an array), and I want to know which method is more efficient, and how to tell which method is more efficient.

The documents look like this:

{"foo":[1,2,3,4]}
{"foo":[5,6,7]}

I can get the number I want by running either:

db.countTest.aggregate([{$unwind:"$foo"},{$group:{_id:null,"total foo":{$sum:1}}}])

or

db.countTest.aggregate([{$project:{count:{$size:"$foo"}}},{$group:{_id:null,"total foo":{$sum:"$count"}}}])

SO my question is: which method is more efficient, and how do i tell which is more efficient. I have run the aggregate queries with {explain:true} however I didn't find anything especially useful in the output. My feeling is the second method is more efficient, since it doesn't require the unwind, but I was hoping to be able to prove that. The reason for asking is that this example is a scaled down version of a real life scenario with much more, much larger documents.

Also, what are the potential performance impact of running these aggregations on the mongodb? Could the aggregation affect the database performance in general, and the response times of other users?

thanks for your help.

Community
  • 1
  • 1
robjwilkins
  • 5,462
  • 5
  • 43
  • 59
  • Classic open ended question. – Saleem Mar 22 '16 at 12:21
  • One method must be more efficient than the other, and running the aggregate will either affect the performance of the database, or it will not. So i'm not sure how the question is open ended? – robjwilkins Mar 22 '16 at 13:43
  • It's opinionated question. You need to benchmark performance yourself and see what is more efficient and what not. – Saleem Mar 22 '16 at 13:45

1 Answers1

0

I created a collection with over 1million documents each looking like this:

{ "_id" : ObjectId("56f155f6006ec8ceeec407ec"), "foo" : [ 1, 2, 3, 4, 5, 6, 7, 8, 9, 0 ] }

I then wrote a javascript which runs the different aggregate queries three times, calculates the deltas and prints out the results. The script looks like this:

var start = Date.now();
db.countTest.aggregate([{$unwind:"$foo"},{$group:{_id:null,"total foo":{$sum:1}}}])
var done = Date.now();
var delta1 = done - start;
print(delta1);
var start = Date.now();
db.countTest.aggregate([{$unwind:"$foo"},{$group:{_id:null,"total foo":{$sum:1}}}])
var done = Date.now();
var delta2 = done - start;
print(delta2);
var start = Date.now();
db.countTest.aggregate([{$unwind:"$foo"},{$group:{_id:null,"total foo":{$sum:1}}}])
var done = Date.now();
var delta3 = done - start;
print(delta3);
var avDelta = (delta1 + delta2 + delta3) / 3;
print ("$Unwind query: average query duration (millis):" + avDelta);

var start = Date.now();
db.countTest.aggregate([{$project:{count:{$size:"$foo"}}},{$group:{_id:null,"total foo":{$sum:"$count"}}}])
var done = Date.now();
var delta1 = done - start;
print(delta1);
var start = Date.now();
db.countTest.aggregate([{$project:{count:{$size:"$foo"}}},{$group:{_id:null,"total foo":{$sum:"$count"}}}])
var done = Date.now();
var delta2 = done - start;
print(delta2);
var start = Date.now();
db.countTest.aggregate([{$project:{count:{$size:"$foo"}}},{$group:{_id:null,"total foo":{$sum:"$count"}}}])
var done = Date.now();
var delta3 = done - start;
print(delta3);
var avDelta = (delta1 + delta2 + delta3) / 3;
print ("$Size query: average query duration (millis):" + avDelta);

This returns the following results:

MongoDB shell version: 3.2.0
connecting to: test
2630
2573
2563
$Unwind query: average query duration (millis):2588.6666666666665
2026
2107
2003
$Size query: average query duration (millis):2045.3333333333333

I ran this several times and the unwind query was consistently slower. It would be interesting to see what difference comes from larger documents, as running a similar query against a production collection with ~250k larger more complex documents is taking significantly longer than 2-3seconds.

robjwilkins
  • 5,462
  • 5
  • 43
  • 59