I'm not sure how to phrase this, but basically I want to group documents by a field from a child array, then I want to group by a field in the parent (root) document, but retaining the prior grouping.
I hope an example would help here.
Let's say I have these documents, where informations about several custItemNum
are almost grouped by originalFile
:
[
{
"items" : [
{
"recType" : "I2",
"qty" : 2.0,
"custItemNum" : 10.0
},
{
"recType" : "I2",
"qty" : 200.0,
"custItemNum" : 20.0
},
{
"recType" : "I2",
"qty" : 50.0,
"custItemNum" : 30.0
},
{
"recType" : "D9",
"custItemNum" : 10.0
},
{
"recType" : "D9",
"custItemNum" : 20.0
},
{
"recType" : "D9",
"custItemNum" : 30.0
}
],
"originalFile" : "727451921.txt",
"docId" : "278791399"
},
{
"items" : [
{
"recType" : "I2",
"qty" : 180.0,
"custItemNum" : 20.0
}
],
"originalFile" : "727557371.txt",
"docId" : "278791399"
},
{
"items" : [
{
"recType" : "I2",
"qty" : 10.0,
"custItemNum" : 30.0
}
],
"originalFile" : "727557371.txt",
"docId" : "278791399"
},
{
"items" : [
{
"recType" : "I2",
"qty" : 10.0,
"custItemNum" : 30.0
}
],
"originalFile" : "727557371.txt",
"docId" : "278791399"
}
]
I want to end up with a collection like this, where the first grouping is by custItemNumber
and then by originalFile
:
[
{
"custItemNumber" : 10.0,
"count" : 2.0,
"itemInfo" : [
{
"originalFile" : "727451921.txt",
"item" : [
{
"recType" : "I2",
"qty" : 2.0,
"custItemNum" : 10.0
},
{
"recType" : "D9",
"custItemNum" : 10.0
}
]
}
]
},
{
"custItemNumber" : 20.0,
"count" : 3.0,
"itemInfo" : [
{
"originalFile" : "727451921.txt",
"item" : [
{
"recType" : "I2",
"qty" : 200.0,
"custItemNum" : 20.0
},
{
"recType" : "D9",
"custItemNum" : 20.0
}
]
},
{
"originalFile" : "727557371.txt",
"item" : [
{
"recType" : "I2",
"qty" : 180.0,
"custItemNum" : 20.0
}
]
}
]
},
{
"custItemNumber" : 30.0,
"count" : 4.0,
"itemInfo" : [
{
"originalFile" : "727451921.txt",
"item" : [
{
"recType" : "I2",
"qty" : 50.0,
"custItemNum" : 30.0
},
{
"recType" : "D9",
"custItemNum" : 30.0
}
]
},
{
"originalFile" : "727557371.txt",
"item" : [
{
"recType" : "I2",
"qty" : 10.0,
"custItemNum" : 30.0
},
{
"recType" : "I2",
"qty" : 10.0,
"custItemNum" : 30.0
}
]
}
]
}
]
Bear in mind that these documents already come from several aggregation steps so no useful _id
field is available.
So far I came up with these aggregation stages (and I manually edited its output to obtain the result above):
{$unwind: "$items"},
{$bucket: {
groupBy: "$items.custItemNum",
boundaries: [10, 20, 30, 40, 50, 60, 70, 80, 90, 100],
output: {
count: {$sum: 1},
itemInfo: {$push: "$$ROOT"}
}
}}
that leads to this result:
[
{
"_id" : 10.0,
"count" : 2.0,
"itemInfo" : [
{
"_id" : ObjectId("5a7336ebb4b169272dae528f"),
"items" : {
"recType" : "I2",
"qty" : 2.0,
"custItemNum" : 10.0
},
"originalFile" : "727451921.txt",
"docId" : "278791399"
},
{
"_id" : ObjectId("5a7336ebb4b169272dae528f"),
"items" : {
"recType" : "D9",
"custItemNum" : 10.0
},
"originalFile" : "727451921.txt",
"docId" : "278791399"
}
]
},
{
"_id" : 20.0,
"count" : 3.0,
"itemInfo" : [
{
"_id" : ObjectId("5a7336ebb4b169272dae528f"),
"items" : {
"recType" : "I2",
"qty" : 200.0,
"custItemNum" : 20.0
},
"originalFile" : "727451921.txt",
"docId" : "278791399"
},
{
"_id" : ObjectId("5a7336ebb4b169272dae528f"),
"items" : {
"recType" : "D9",
"custItemNum" : 20.0
},
"originalFile" : "727451921.txt",
"docId" : "278791399"
},
{
"_id" : ObjectId("5a7336ebb4b169272dae5290"),
"items" : {
"recType" : "I2",
"qty" : 180.0,
"custItemNum" : 20.0
},
"originalFile" : "727557371.txt",
"docId" : "278791399"
}
]
},
{
"_id" : 30.0,
"count" : 4.0,
"itemInfo" : [
{
"_id" : ObjectId("5a7336ebb4b169272dae528f"),
"items" : {
"recType" : "I2",
"qty" : 50.0,
"custItemNum" : 30.0
},
"originalFile" : "727451921.txt",
"docId" : "278791399"
},
{
"_id" : ObjectId("5a7336ebb4b169272dae528f"),
"items" : {
"recType" : "D9",
"custItemNum" : 30.0
},
"originalFile" : "727451921.txt",
"docId" : "278791399"
},
{
"_id" : ObjectId("5a7336ebb4b169272dae5291"),
"items" : {
"recType" : "I2",
"qty" : 10.0,
"custItemNum" : 30.0
},
"originalFile" : "727557371.txt",
"docId" : "278791399"
},
{
"_id" : ObjectId("5a7336ebb4b169272dae5292"),
"items" : {
"recType" : "I2",
"qty" : 10.0,
"custItemNum" : 30.0
},
"originalFile" : "727557371.txt",
"docId" : "278791399"
}
]
}
]
I'm stuck here, any other step that comes to mind (i.e. a $replaceRoot : { newRoot: "$itemInfo" }
) would break the outer grouping.
Plus, the custItemNum
values are dynamic, but AFAICT the boundaries
field of the $bucket
stage takes a constant array, so if there's a way to pass a computed array there, I'd like to know how.