When do I use prefer MapReduce over Pipeline in MongoDB or vice versa? I feel most of the aggregation operations are suitable for pipeline. What kind of complexity of the problem or what use case should make me go for MapReduce.
-
1Looks like a dupe of: http://stackoverflow.com/questions/12337319/mongodb-aggregation-comparison-group-group-and-mapreduce – JohnnyHK Aug 24 '14 at 17:14
-
@JohnnyHK I don't think so. That question is a lot more specific. – Philipp Aug 24 '14 at 17:57
-
1@Philipp The question is more specific, but Stennie's answer is much broader. – JohnnyHK Aug 24 '14 at 18:04
1 Answers
As a general rule of thumb: When you can do it with the aggregation pipeline, you should.
One reason is that the aggregation pipeline is able to use indexes and internal optimizations between the aggregation steps which are just not possible with MapReduce.
Aggregation is also a lot more secure when the operation is triggered by user input. When there are any user-supplied parameters to your query, MapReduce forces you to create javascript functions through string concatenation. This opens the door for dangerous Javascript code injection vulnerabilities. The APIs used for creating aggregation pipeline objects (in most programming languages!) usually has fewer such obvious pitfalls.
There are, however, still a few cases which can not be done easily or not at all with aggregation. For these cases, MapReduce has still a reason to exist.
Another limitation of the aggregation framework is that the intermediate dataset after each aggregation step is limited to 100MB unless you use the allowDiskUse
option, which really slows down the query. MapReduce usually behaves a lot better when you need to work with a really large dataset.

- 67,764
- 9
- 118
- 153