MongoDB - aggregate to another collection?

Question

I have a process that I'm currently using Mongo's Map/Reduce framework for, but it's not performing very well. It's a pretty simple aggregation, where I bucketize over 3 fields, returning the sum of 4 different fields, and passing through the values for another 4 fields (which are constant within each bucket).

For reasons described in [ Map-Reduce performance in MongoDb 2.2, 2.4, and 2.6 ], I'd like to convert this to the aggregation framework for better performance, but there are 3 things standing in the way, I think:

The total result can be large, exceeding Mongo's 16MB limit, even though any one document in the result is very small.
I can map/reduce directly to another collection, but the aggregation framework can only return results inline (I think?)
For incremental updates as more data arrives in the source collection, I can map/reduce with MapReduceCommand.OutputType (in Java) set to REDUCE, exactly matching my use case, but I don't see a corresponding functionality in the aggregation framework.

Are there good ways to solve these in the aggregation framework? The server is version 2.4.3 right now - we can probably update as needed if there are new capabilities.

1) Can you "chunk" the work into smaller segments? 2) Only inline 3) Aggregation is a run once deal (although you could save the results into another collection thru a client). — WiredPrairie, Jun 20 '13 at 16:26
For 3), aggregation can easily be an incremental operation if your first step is a `{$match: ...}` to select just the new data. — Ken Williams, Jun 20 '13 at 16:35
The worst part here is that the pipeline is very simple and it runs very fast, but I can't actually use the results. — Ken Williams, Jun 20 '13 at 17:22
That's why I was asking if you could chunk the aggregation into smaller segments and save them into another collection? — WiredPrairie, Jun 20 '13 at 21:58
It's possible maybe, but not easy. It's hard to predict the size of any given chunk's output, and the chunking scheme may need to be reworked as the data changes. What I mean by my "worst part" comment is that Mongo seems to have no problem calculating the results, but chokes in actually returning them. — Ken Williams, Jun 21 '13 at 02:27
How many results at most can be returned from the pipeline in the end? — Asya Kamsky, Jun 21 '13 at 16:56

score 4 · Answer 1 · answered Apr 07 '14 at 23:52

4

You can do that now with $out as explained in mongo

$out Takes the documents returned by the aggregation pipeline and writes them to a specified collection. The $out operator lets the aggregation framework return result sets of any size. The $out operator must be the last stage in the pipeline.

The command has the following syntax, where is collection that will hold the output of the aggregation operation. $out is only permissible at the end of the pipeline:

db.<collection>.aggregate( [
     { <operation> },
     { <operation> },
     ...,
     { $out : "<output-collection>" }
] )

answered Apr 07 '14 at 23:52

lesolorzanov

3,536
8
35
53

Unfortunately that still doesn't address issue #3 in the original question, but I agree it's a big step forward. – Ken Williams Apr 19 '14 at 02:46
Can't do use $out with pre 2.6 though. Thoughts on how to handle aggregation to another collection when $out is not available? – conner.xyz Aug 15 '15 at 20:21

score 1 · Accepted Answer · edited May 23 '17 at 11:51

1

The Aggregation framework currently cannot be outputted to another collection directly. However you can try the answer in this discussion: SO-questions-output aggregate to new collection The mapreduce is way slower and I too have been waiting for a solution. You can try the Hadoop to Mongodb connector, which is supported in the mongodb website. Hadoop is faster at mapreduce. But I do not know if it would be well suited in your specific case.

Link to hadoop + MongoDB connector

All the best.

edited May 23 '17 at 11:51

Community

1
1

answered Jun 20 '13 at 16:11

Sai

461
7
25

1

That solves problem #2, but unfortunately not #1 and #3. I could probably solve #3 by getting the results & reducing manually. But the only way I can think of to solve #1 is to bite off much smaller pieces at a time, which is hard to figure out in advance of the query. – Ken Williams Jun 20 '13 at 16:32
1

Clarification - when I say "that", I meant your first idea. I'll have a look at the Hadoop+MongoDB connector, I haven't looked at it before. Thanks. – Ken Williams Jun 20 '13 at 16:34
Yes breaking it up is probably the only way to solve #1(sorry doesn't help much). But that is mostly impractical as the data gets larger. I have had similar issues in the past with the aggregation framework's limited usability. Hope MongoDB comes up with an upgade to either Mapreduce or Aggregate framework. – Sai Jun 20 '13 at 16:38
2

or just wait for 2.6 which will return a cursor from aggregation framework and allow you to output results to another collection. – Asya Kamsky Jun 21 '13 at 16:47
That's great to hear @AsyaKamsky, I didn't know that. Is there a roadmap online somewhere showing what's scheduled to be in 2.6? – Ken Williams Jun 25 '13 at 15:57
2

Yep - you can see it in its crudest form in the "fix version" field of any server ticket: https://jira.mongodb.org/browse/SERVER-3253 which you already saw, and you can see by clicking on the fix version where it fits within the "sequence" of versions - i.e. 2.5.w is the top bucket to go into 2.6 release, before all the 2.5.x and 2.5.desired, etc. – Asya Kamsky Jun 25 '13 at 16:48

MongoDB - aggregate to another collection?

2 Answers2