0

I am storing unordered data in arrays in MongoDB and want to retrieve distinct sets.

For example, my data is:

{message_id: 1, participants: [ 'user1', 'user2'] }
{message_id: 2, participants: [ 'user2', 'user1'] }
{message_id: 3, participants: [ 'user3', 'user4'] }
{message_id: 4, participants: [ 'user4', 'user3'] }

And I want to return the arrays into distinct sets (with values in no specific order):

[ 'user1', 'user2' ]
[ 'user3', 'user4' ]

Is it possible to deduplicate unordered arrays like this? This is for a data migration task, so it doesn't have to be performant.

androidnotgenius
  • 417
  • 1
  • 7
  • 17

1 Answers1

1

You can do this, sort(switching here because only 2) the array by putting the smaller string first, and then group by that array.

You can do it without $addFields also, and do this $cond inside the $group but this looks less nested.

db.collection.aggregate([
  {
    "$addFields": {
      "participants": {
        "$cond": [
          {
            "$lt": [
              {
                "$strcasecmp": [
                  {
                    "$arrayElemAt": [
                      "$participants",
                      0
                    ]
                  },
                  {
                    "$arrayElemAt": [
                      "$participants",
                      1
                    ]
                  }
                ]
              },
              0
            ]
          },
          "$participants",
          {
            "$reverseArray": "$participants"
          }
        ]
      }
    }
  },
  {
    "$group": {
      "_id": "$participants"
    }
  },
  {
    "$project": {
      "_id": 0,
      "distinctParticipants": "$_id"
    }
  }
])

Run the code

In case your arrays can have more than 2 members, you still can do it, by sorting all of them (switching is simpler if only 2 members)

Check this answer also if you have more members

Takis
  • 8,314
  • 2
  • 14
  • 25
  • Worked like a charm! I failed to mention that I'm actuallying dealing with ObejctIds, so I used the $cmp aggregation instead of $strcasecmp. Thank you! – androidnotgenius Aug 12 '21 at 02:06