1

I know this has been covered quite a lot on here, however, i'm very new to MongoDB and am struggling with applying answers i've found to my situation.

In short, I have two collections 'total_by_country_and_isrc' which is the output from a MapReduce function and 'asset_report' which contains an asset_id not present in the 'total_by_country_and_isrc' collection or the original raw data collection this was MapReduced from.

An example of the data in 'total_by_country_and_isrc' is:

{ "_id" : { "custom_id" : 4748532, "isrc" : "GBCEJ0100080", "country" : "AE" }, "value" : 0 }

And an example of the data in the 'asset_report' is:

{ "_id" : ObjectId("51824ef016f3edbb14ef5eae"), "Asset ID" : "A836656134476364", "Asset Type" : "Web", "Metadata Origination" : "Unknown", "Custom ID" : "4748532", "ISRC" : "", }

I'd like to end up with the following ('total_by_country_and_isrc_with_asset_id'):

{ "_id" : { "Asset ID" : "A836656134476364", "custom_id" : 4748532, "isrc" : "GBCEJ0100080", "country" : "AE" }, "value" : 0 }

I know how I would approach with in a relational database but I really want to try and get this working in Mongo as i'm dealing with some pretty large collections and feel Mongo is the right tool for the job.

Can anyone offer some guidance here?

Raoot
  • 1,751
  • 1
  • 25
  • 51
  • You can't "join" two collections like that unfortunately. MapReduce operates on only a single collection (and single document). You may need to store the data more denormalized to get the MapReduce working. – WiredPrairie May 02 '13 at 12:35
  • Yeah I realise you can't do joins in the way you would in a relational DB, but I have seen some examples that seem to infer it's possible with MapReduce. Thanks for the denormalize tip though, i'll look into that. – Raoot May 02 '13 at 13:19
  • Do you have links to the examples you could provide? Maybe I'm missing something about what you're trying to do. You might be able to do a two MapReduces and merge/`out` into the same collection (which is what some try). Something like: http://stackoverflow.com/questions/9696940/merging-two-collections-in-mongodb I don't follow your example well enough to say whether that might work though. – WiredPrairie May 02 '13 at 14:05

1 Answers1

0

I think you want to use the "reduce" output action: Output to a Collection with an Action. You'll need to regenerate total_by_country_and_isrc, because it doesn't look like asset_report has the fields it needs to generate the keys you already have in total_by_country_and_isrc – so "joining" the data is impossible.

First, write a map method that is capable of generating the same keys from the original collection (used to generate total_by_country_and_isrc) and also from the asset_report collection. Think of these keys as the "join" fields.

Next, map and reduce your original collection to create total_by_country_and_isrc with the correct keys.

Finally, map asset_report with the same method you used to generate total_by_country_and_isrc, but use a reduce function that can be used to reduce the intersection (by key) of this mapped data from asset_report and the data in total_by_country_and_isrc.

Johntron
  • 2,443
  • 2
  • 24
  • 26