0

I have a field on my user documents called _badges which are ObjectIds. I am attempting to remove duplicate values in the array and have been successful using async iterators via Mongoose script but that is a bit too slow for my liking:

async function removeDuplicateBadges() {
  // Use async iterator to remove duplicate badge ids
  for await(const userDoc of User.find()) {
    console.log(userDoc.displayName)
    const badgesStringArray = userDoc._badges.map(badge => {
      if(badge === null) return
      else return badge.toString()
    })
    const uniqueBadgesArray = [...new Set(badgesStringArray)]
    await User.findByIdAndUpdate(
      userDoc._id,
      {
        _badges: uniqueBadgesArray
      }
    )
  }
}

I tried doing the same using the following aggregation command but that did not seem to remove the duplicate values on the actual documents stored in the database.

It only returns results as the aggregate framework is meant to query and transform not mutate the underlying data source:

db.getCollection("users").aggregate(
    [
        { 
            "$unwind" : { 
                "path" : "$_badges"
            }
        }, 
        { 
            "$group" : { 
                "_id" : "$_id", 
                "_badges" : { 
                    "$addToSet" : "$_badges"
                }
            }
        }
    ])
   

Any hints on effective ways to remove duplicate values would be appreciated that are better time efficiency than using the async iterator methodology above.

SKeney
  • 1,965
  • 2
  • 8
  • 30
  • These posts have details about how to find duplicates, which can be used to delete them: (1) [Find duplicate urls in MongoDB](https://stackoverflow.com/questions/61062508/find-duplicate-urls-in-mongodb/61072540#61072540) (2) [Remove duplicates from MongoDB 4.2 data base](https://stackoverflow.com/questions/58409232/remove-duplicates-from-mongodb-4-2-data-base/58414376#58414376) – prasad_ Feb 18 '22 at 01:46

1 Answers1

2

maybe this will help:

"$setUnion": [
      "_badges",
      []
    ]
  • Thanks for the suggestion. I edited the post to be more clear. I'm not just looking to get the duplicate values removed in the returned results. I'm actually looking to remove them in the underlying data source. I could be wrong, but I'm pretty sure the aggregate framework won't work here as it is meant for query and transform functionality NOT mutation. – SKeney Feb 18 '22 at 00:38