1

I have a collection of 30000 documents of the form:

<_id:X src: 1 , dst:2}
<_id:X src: 1 , dst:3}
<_id:X src: 1 , dst:4}
<_id:X src: 1 , dst:5}
<_id:X src: 1 , dst:6}
<_id:X src: 1 , dst:7}
...

I transform this collection into the following form:

<_id: 1 , listOfNumbers:[2, 3, 4, 5, 6, 7}
<_id: 2 , dst:0}
<_id: 3 , dst:0}
<_id: 5 , dst:0}
<_id: 7 , dst:0}
<_id: 9 , dst:0}
...

I do this by using a MapReduce operation and the push method and it takes 12s to convert 10000 documents, 75s to convert 20000 documents.

Does anybody know how this could be done faster? Do indexes play a role here?

Thank you!

David Robinson
  • 77,383
  • 16
  • 167
  • 187
TheAptKid
  • 1,559
  • 3
  • 25
  • 47
  • Why dont' you use the MongoDB aggregation framework for this? –  Dec 01 '12 at 20:02
  • Besides the fact I don't know what that is, I am pretty sure I shouldn't use it. :) – TheAptKid Dec 01 '12 at 20:11
  • 4
    Perhaps you want to google for Mongodb+Aggregation+Framework instead of posting such dumb followups –  Dec 01 '12 at 20:12
  • I'm sorry. I didn't wan't to offend you. Thank you for your answer. – TheAptKid Dec 01 '12 at 20:40
  • I have to use the MapReduce approach for solving this problem. That was what I meant. – TheAptKid Dec 01 '12 at 20:57
  • You have to use MR? Why???? Three good reasons –  Dec 02 '12 at 06:46
  • I looked at the aggregation framework. It is super fast, but it lacks the ability to save the result directly to MongoDB. – TheAptKid Dec 02 '12 at 10:49
  • You didn't look particularly hard - http://stackoverflow.com/questions/13612028/export-mongodb-aggregation-framework-result-to-a-new-collection – Alex Dec 03 '12 at 13:59
  • I meant directly to MongoDB, without using JS (problem with converting 30000 documents). – TheAptKid Dec 03 '12 at 14:06
  • 'without using JS' - How do you propose to communicate with the MongoDB instance? – Alex Dec 03 '12 at 15:39
  • Sorry, I slipped again. I meant without using JavaScript variables. – TheAptKid Dec 03 '12 at 16:13
  • what's the problem with javascript variables? It's either keep your mapreduce, use the aggregation framework, or write some kind of script in your chosen language to iterate over the docs in your collection, saving to a new collection. Obviously #3 is slowest, and least recommended way – Alex Dec 03 '12 at 17:08
  • If you are fetching the data out of Mongo then inserting it back in again you're introducing additional network latency which will slow your process down, especially with so much data to process. As @alexjamesbrown suggests, you would be best off using the aggregation framework so it all happens on the server. – Mark Unsworth Dec 04 '12 at 16:02
  • And a partial answer to your question, indexes will only play a part if you are querying by a particular field on the documents (src?) which you don't specify. Without seeing the code you are using to query it's hard to tell. – Mark Unsworth Dec 04 '12 at 16:04

0 Answers0