1

In the below query you can see the order of elements in the statuses array this is the actual order which I have in my document.

Query:

db.order_test.aggregate([]);

Result:

{
    "_id" : ObjectId("5ea0805cb0b44d2784a70f90"),
    "statuses" : [
        {
            "order" : 3,
            "created_on" : ISODate("2019-11-25T18:44:48.930Z"),
            "name" : "In Progress"
        },
        {
            "order" : 2,
            "created_on" : ISODate("2019-11-25T18:44:55.104Z"),
            "name" : "Pending"
        },
        {
            "order" : 2,
            "created_on" : ISODate("2019-11-25T18:45:09.022Z"),
            "name" : "Sent"
        },
        {
            "order" : 1,
            "created_on" : ISODate("2019-11-25T20:04:49.347Z"),
            "name" : "Initial Viewed"
        },
        {
            "order" : 6,
            "created_on" : ISODate("2019-11-25T20:04:49.347Z"),
            "name" : "Viewed"
        },
        {
            "order" : 4,
            "created_on" : ISODate("2019-11-25T20:04:49.347Z"),
            "name" : "Opened"
        },
        {
            "order" : 2,
            "created_on" : ISODate("2019-12-15T05:59:04.719Z"),
            "name" : "Abandoned"
        }
    ]
}

Now after applying $setUnion

Query:

db.order_test.aggregate([
    {
        $addFields: {
            statuses: {$setUnion: ['$statuses']}
        }
    }    
]);

Result:

{
    "_id" : ObjectId("5ea0805cb0b44d2784a70f90"),
    "statuses" : [
        {
            "order" : 1,
            "created_on" : ISODate("2019-11-25T20:04:49.347Z"),
            "name" : "Initial Viewed"
        },
        {
            "order" : 2,
            "created_on" : ISODate("2019-11-25T18:44:55.104Z"),
            "name" : "Pending"
        },
        {
            "order" : 2,
            "created_on" : ISODate("2019-11-25T18:45:09.022Z"),
            "name" : "Sent"
        },
        {
            "order" : 2,
            "created_on" : ISODate("2019-12-15T05:59:04.719Z"),
            "name" : "Abandoned"
        },
        {
            "order" : 3,
            "created_on" : ISODate("2019-11-25T18:44:48.930Z"),
            "name" : "In Progress"
        },
        {
            "order" : 4,
            "created_on" : ISODate("2019-11-25T20:04:49.347Z"),
            "name" : "Opened"
        },
        {
            "order" : 6,
            "created_on" : ISODate("2019-11-25T20:04:49.347Z"),
            "name" : "Viewed"
        }
    ]
}

As it can be clearly seen that $setUnion is ordering by first property in the elements which is "order" and then by second property which is "created_on" and then it will probably do by "name" which is the last property of each element inside statuses array.

This behavior is against what is mentioned in documentation https://docs.mongodb.com/manual/reference/operator/aggregation/setUnion/

enter image description here

This ordering is very useful for me, should I trust it?

Scenario I am working on:

"statuses" : [
        {
            "name" : "In Progress",
            "created_on" : ISODate("2019-11-25T18:44:50.302Z")
        },
        {
            "name" : "Pending",
            "created_on" : ISODate("2019-11-25T18:44:55.104Z")
        },
        {
            "name" : "Sent",
            "created_on" : ISODate("2019-11-25T18:45:19.871Z")
        },
        {
            "name" : "Initial Viewed",
            "created_on" : ISODate("2019-11-25T20:08:42.299Z")
        },
        {
            "name" : "Viewed",
            "created_on" : ISODate("2019-11-25T20:10:04.016Z")
        },
        {
            "name" : "Pending",
            "created_on" : ISODate("2019-11-25T20:49:56.008Z")
        },
        {
            "name" : "Sent",
            "created_on" : ISODate("2019-11-26T02:30:17.701Z")
        },
        {
            "name" : "Initial Viewed",
            "created_on" : ISODate("2019-11-26T02:30:17.701Z")
        },
        {
            "name" : "Viewed",
            "created_on" : ISODate("2019-11-26T02:30:17.701Z")
        },
        {
            "name" : "Opened",
            "created_on" : ISODate("2019-11-26T02:30:17.701Z")
        },
        {
            "name" : "Completed",
            "created_on" : ISODate("2019-11-26T02:33:56.484Z")
        }
    ],

I have more than 50k docs in my collection with above given property of type array. The problem is, some how i have duplication of status by name, as you can see Pending is appearing twice same goes for Sent Initial Viewed and Viewed.

Required result:

I have to update all the documents which have duplicate status names in statuses array, to remove all duplicate entries. Any status which appeared first should stay and all other duplicates should be removed.

Is there any simple way to do it through Mongo native update query(not javascript)? My first priority is how can I match these records which have duplicate entries by name?

Abdul Moiz
  • 1,317
  • 2
  • 17
  • 40
  • `should I trust it?` Nope, that's why the documentation says it is `unspecified` – Joe Apr 23 '20 at 02:35

1 Answers1

1

Soution (v4.2):

db.order_test.update({},
  [
    {$set:{
      statuses: {
        $reduce: {
          input: "$statuses",
          initialValue: [],
          in: {
            $concatArrays: [
              "$$value",
              {
                $cond: [
                  {
                    $eq: [
                      {
                        $size: {
                          $filter: {
                            input: "$$value",
                            as: "uniq",
                            cond: {
                              $eq: [
                                "$$uniq.name",
                                "$$this.name"
                              ]
                            }
                          }
                        }
                      },
                      0
                    ]
                  },
                  [
                    "$$this"
                  ],
                  []
                ]
              }
            ]
          }
        }
      }
    }}
  ]
  {multi:true}
)

MongoPlayground

Workaround (<v4.2):

db.order_test.aggregate([
  {
    $addFields: {
      statuses: {
        $reduce: {
          input: "$statuses",
          initialValue: [],
          in: {
            $concatArrays: [
              "$$value",
              {
                $cond: [
                  {
                    $eq: [
                      {
                        $size: {
                          $filter: {
                            input: "$$value",
                            as: "uniq",
                            cond: {
                              $eq: [
                                "$$uniq.name",
                                "$$this.name"
                              ]
                            }
                          }
                        }
                      },
                      0
                    ]
                  },
                  [
                    "$$this"
                  ],
                  []
                ]
              }
            ]
          }
        }
      }
    }
  }
  //,{$out:"order_test"}
])

MongoPlayground

Note: Uncomment $out operator to override order_test collection with aggregation result.


This ordering is very useful for me, should I trust it?

No

MongoDB’s comparison of BSON objects uses the following order:
 1. Recursively compare key-value pairs in the order that they appear within the BSON object.
... https://docs.mongodb.com/manual/reference/bson-type-comparison-order/#objects

It means:

{                                          {
  "created_on" : ISODate("2019-11-25"),  ≠   "name" : "In Progress",
  "name" : "In Progress"                 ≠   "created_on" : ISODate("2019-11-25"),
}                                          }

"Yes"

  • if you make sure all statuses has the same key-value pairs order.
  • Do not use pymongo, since Python dict (before Python 3.7) and JSON object are unordered collections
Valijon
  • 12,667
  • 4
  • 34
  • 67