I have a Mongo database that contains an imported flat-file CSV. In SQL, this file should undoubtedly be normalized: the file contains one line per period and the periods contain information that is repetitive. I have created a query that uses the 'push' operator to aggregate (some of) the repetitive information into a single sub-object within the row. This mimics normalization. What I would like to do is restructure the output object so that the sub-object dictionary is uses keys and values on the top level. This, in SQL, is called a Pivot query or Crosstab query. In Excel it is referred to as transposition. Regardless of name, what I'm looking for is the ability to take key-value pairs and use them as 'columns' in Mongo.
Since Mongo and other NoSQL databases are aimed at denormalized implementations, I'm surprised this is so hard.
I'm trying to put the following JSON object into Mongo:
[{ "_id": {"Date": "1/1/2018", "Type": "Green", "client_id": 1},
"Sub_data": [{"sub_id" : 1}, {"sub_value": 2}] },
{ "_id": {"Date": "1/1/2018", "Type": "Green", "client_id": 1},
"Sub_data": [{"sub_id" : 2}, {"sub_value": 5}] },
{ "_id": {"Date": "1/2/2018", "Type": "Green", "client_id": 1},
"Sub_data": [{"sub_id" : 2}, {"sub_value": 4}] },
{ "_id": {"Date": "1/1/2018", "Type": "Orange", "client_id": 1},
"Sub_data": [{"sub_id" : 6}, {"sub_value": 7}] }]
And get the following out:
[{ "_id": {"Date": "1/1/2018", "Type": "Green", "client_id": 1},
"1" : 2, "2":5},
{ "_id": {"Date": "1/2/2018", "Type": "Green", "client_id": 1},
"2" : 4},
{ "_id": {"Date": "1/2/2018", "Type": "Orange", "client_id": 1},
"6" : 7}]
Note that I want there to be an arbitrary number of columns to this result. I have looked at a number of solutions that SEEM to approach the problem (Array to object, AddFields, ReplaceRoot, Something like a pivot using static columns) and I've read multiple versions of this 'do it afterwards' code. Is post-processing the only way to do this?
NOTE: This is an attempt to mimic SQL server (and Excel and so on) functionality described in this Stack Overflow question and this TechNet article.
Rolled up, my total pipeline using the first answer's second option looks like this:
db.rate_cards.aggregate(
{
"$group": {
"_id": {
"date": "$date",
"start_date": "$start_date",
"end_date": "$end_date"
},
"code_data": {
"$push": {
"code_str": {"$substr" : ["$code",0,-1]},
"cpm": "$cpm"
}
}
}
},
{
"$group":{
"_id":"$_id",
"data":{
"$mergeObjects":{
"$arrayToObject":[[
{
"k":{"$let":{"vars":{"sub_id_elem":{"$arrayElemAt":["$code_data",0]}},"in":"$$sub_id_elem.code_str"}},
"v":{"$let":{"vars":{"sub_value_elem":{"$arrayElemAt":["$code_data",1]}},"in":"$$sub_value_elem.cpm"}}
}
]]
}
}
}
},
{"$replaceRoot":{"newRoot":{"$mergeObjects":["$_id",{"$arrayToObject":"$data"}]}}}
)
Note that this is a little more complicated and performance-intensive than I'd hoped. It seems to declare a local variable, use an in-clause, and so on. In attempting to run the (working) implementation of the two answers NoSQL booster chokes trying to expand row 600'ish.
A slightly edited version of the original dataset is below. Note that there are a few extra fields not used in the original query, and they have been omitted:
{
"_id" : ObjectId("5a578d5c57d33b197004beed"),
"date" : ISODate("2017-09-25T03:00:00.000+03:00"),
"start_date" : ISODate("2017-09-25T03:00:00.000+03:00"),
"end_date" : ISODate("2017-10-01T03:00:00.000+03:00"),
"dp" : "M-Su 12m-6a",
"dsc" : "Daypart",
"net" : "val1",
"place" : "loc1",
"code" : 12,
"cost" : 16.8
},
{
"_id" : ObjectId("5a578d5c57d33b197004beee"),
"date" : ISODate("2017-09-25T03:00:00.000+03:00"),
"start_date" : ISODate("2017-09-25T03:00:00.000+03:00"),
"end_date" : ISODate("2017-10-01T03:00:00.000+03:00"),
"dp" : "M-Su 12m-6a",
"dsc" : "Daypart",
"net" : "val1",
"place" : "loc3",
"code" : 24,
"cost" : 55.6
},
{
"_id" : ObjectId("5a578d5c57d33b197004beef"),
"date" : ISODate("2017-09-25T03:00:00.000+03:00"),
"start_date" : ISODate("2017-09-25T03:00:00.000+03:00"),
"end_date" : ISODate("2017-10-01T03:00:00.000+03:00"),
"dp" : "M-Su 12n-6p",
"dsc" : "Daypart",
"net" : "val2",
"place" : "loc2",
"code" : 23,
"cost" : 65.5
},
{
"_id" : ObjectId("5a578d5c57d33b197004bef0"),
"date" : ISODate("2017-09-25T03:00:00.000+03:00"),
"start_date" : ISODate("2017-09-25T03:00:00.000+03:00"),
"end_date" : ISODate("2017-10-01T03:00:00.000+03:00"),
"dp" : "M-Su 6p-12m",
"dsc" : "Daypart",
"net" : "val2",
"place" : "loc2",
"code" : 23,
"cost" : 101
}