In the below query you can see the order of elements in the statuses array this is the actual order which I have in my document.
Query:
db.order_test.aggregate([]);
Result:
{
"_id" : ObjectId("5ea0805cb0b44d2784a70f90"),
"statuses" : [
{
"order" : 3,
"created_on" : ISODate("2019-11-25T18:44:48.930Z"),
"name" : "In Progress"
},
{
"order" : 2,
"created_on" : ISODate("2019-11-25T18:44:55.104Z"),
"name" : "Pending"
},
{
"order" : 2,
"created_on" : ISODate("2019-11-25T18:45:09.022Z"),
"name" : "Sent"
},
{
"order" : 1,
"created_on" : ISODate("2019-11-25T20:04:49.347Z"),
"name" : "Initial Viewed"
},
{
"order" : 6,
"created_on" : ISODate("2019-11-25T20:04:49.347Z"),
"name" : "Viewed"
},
{
"order" : 4,
"created_on" : ISODate("2019-11-25T20:04:49.347Z"),
"name" : "Opened"
},
{
"order" : 2,
"created_on" : ISODate("2019-12-15T05:59:04.719Z"),
"name" : "Abandoned"
}
]
}
Now after applying $setUnion
Query:
db.order_test.aggregate([
{
$addFields: {
statuses: {$setUnion: ['$statuses']}
}
}
]);
Result:
{
"_id" : ObjectId("5ea0805cb0b44d2784a70f90"),
"statuses" : [
{
"order" : 1,
"created_on" : ISODate("2019-11-25T20:04:49.347Z"),
"name" : "Initial Viewed"
},
{
"order" : 2,
"created_on" : ISODate("2019-11-25T18:44:55.104Z"),
"name" : "Pending"
},
{
"order" : 2,
"created_on" : ISODate("2019-11-25T18:45:09.022Z"),
"name" : "Sent"
},
{
"order" : 2,
"created_on" : ISODate("2019-12-15T05:59:04.719Z"),
"name" : "Abandoned"
},
{
"order" : 3,
"created_on" : ISODate("2019-11-25T18:44:48.930Z"),
"name" : "In Progress"
},
{
"order" : 4,
"created_on" : ISODate("2019-11-25T20:04:49.347Z"),
"name" : "Opened"
},
{
"order" : 6,
"created_on" : ISODate("2019-11-25T20:04:49.347Z"),
"name" : "Viewed"
}
]
}
As it can be clearly seen that $setUnion is ordering by first property in the elements which is "order" and then by second property which is "created_on" and then it will probably do by "name" which is the last property of each element inside statuses array.
This behavior is against what is mentioned in documentation https://docs.mongodb.com/manual/reference/operator/aggregation/setUnion/
This ordering is very useful for me, should I trust it?
Scenario I am working on:
"statuses" : [
{
"name" : "In Progress",
"created_on" : ISODate("2019-11-25T18:44:50.302Z")
},
{
"name" : "Pending",
"created_on" : ISODate("2019-11-25T18:44:55.104Z")
},
{
"name" : "Sent",
"created_on" : ISODate("2019-11-25T18:45:19.871Z")
},
{
"name" : "Initial Viewed",
"created_on" : ISODate("2019-11-25T20:08:42.299Z")
},
{
"name" : "Viewed",
"created_on" : ISODate("2019-11-25T20:10:04.016Z")
},
{
"name" : "Pending",
"created_on" : ISODate("2019-11-25T20:49:56.008Z")
},
{
"name" : "Sent",
"created_on" : ISODate("2019-11-26T02:30:17.701Z")
},
{
"name" : "Initial Viewed",
"created_on" : ISODate("2019-11-26T02:30:17.701Z")
},
{
"name" : "Viewed",
"created_on" : ISODate("2019-11-26T02:30:17.701Z")
},
{
"name" : "Opened",
"created_on" : ISODate("2019-11-26T02:30:17.701Z")
},
{
"name" : "Completed",
"created_on" : ISODate("2019-11-26T02:33:56.484Z")
}
],
I have more than 50k docs in my collection with above given property of type array. The problem is, some how i have duplication of status by name, as you can see Pending is appearing twice same goes for Sent Initial Viewed and Viewed.
Required result:
I have to update all the documents which have duplicate status names in statuses array, to remove all duplicate entries. Any status which appeared first should stay and all other duplicates should be removed.
Is there any simple way to do it through Mongo native update query(not javascript)? My first priority is how can I match these records which have duplicate entries by name?