I'm just starting out with mongo db and trying to make some simple things. I filled up my database with a collections of data containing the "item" property. I wanted to try to count how much time every item is in the collection
example of a document:
{ "_id" : ObjectId("50dadc38bbd7591082d920f0"), "item" : "Pons", "lines" : 37 }
So I designed these two functions for doing MapReduce (written in python using pymongo)
all_map = Code("function () {"
" emit(this.item, 1);"
"}")
all_reduce = Code("function (key, values) {"
" var sum = 0;"
" values.forEach(function(value){"
" sum += value;"
" });"
" return sum;"
"}")
This worked like a charm, so I began filling the collection. At around 30.000 documents, the mapreduce already lasts longer than a second... Because NoSQL is bragging about speed I thought I must have been doing something wrong!
A Question here at Stack Overflow made me check out the Aggregation feature of mongodb. So I tried to use the group + sum + sort thingies. Came up with this:
db.wikipedia.aggregate(
{ $group: { _id: "$item", count: { $sum: 1 } } },
{ $sort: {count: 1} }
)
This code works just fine and gives me the same results as the mapreduce set, but it is just as slow. Am I doing something wrong? Do I really need to use other tools like hadoop to get a better performance?