0

I'm trying to remove all duplicates in a collection with ensureIndex and dropDups, but this method doesn't seem to work with arrays. For example, if I have a collection that looks like this:

{ "_id" : ObjectId("54d8f889e3fdfe0cd8b769ed"), "field1" : "a", "field2" : [ "a", "b" ] }
{ "_id" : ObjectId("54d8f89be3fdfe0cd8b769ee"), "field1" : "a", "field2" : [ "a", "b" ] }
{ "_id" : ObjectId("54d8f8a3e3fdfe0cd8b769ef"), "field1" : "a", "field2" : [ "a", "c" ] }
{ "_id" : ObjectId("54d8f8abe3fdfe0cd8b769f0"), "field1" : "a", "field2" : [ "b", "a" ] }
{ "_id" : ObjectId("54d8f8c5e3fdfe0cd8b769f1"), "field1" : "b", "field2" : [ "a", "b" ] }

and use ensureIndex like this:

> db.test.ensureIndex({field1: 1, field2: 1}, {unique: true, dropDups: true})

the result would be:

> db.test.find()
{ "_id" : ObjectId("54d8f89be3fdfe0cd8b769ee"), "field1" : "a", "field2" : [ "a", "b" ] }
{ "_id" : ObjectId("54d8f8c5e3fdfe0cd8b769f1"), "field1" : "b", "field2" : [ "a", "b" ] }

Is there a way to do this so that only exact Duplicates (in my example collection only the first or second entry) get deleted?

  • No, because indexes on arrays are [multikey](http://docs.mongodb.org/manual/core/index-multikey/). You can use an aggregation to identify duplicate documents or update each document with a field whose value is based on `field1` and `field2` and then create a unique index with `dropDups` on that field. – wdberkeley Feb 10 '15 at 15:29

1 Answers1

0

As I know this feature doesn't work in arrays. Any particular reason why you can't just use $addToSet when you insert the data?

Check this question, maybe help you MongoDB: Unique index on array element's property

Community
  • 1
  • 1
gasparms
  • 3,336
  • 22
  • 26
  • well, I'm trying to clean up an existing e-mail dataset – Matthias Wadlinger Feb 10 '15 at 10:54
  • also, $addToSet only prevents duplicates in arrays, I want to prevent duplicate documents that may contain arrays as fields – Matthias Wadlinger Feb 10 '15 at 11:11
  • Ok, I see that you use default ObjectId. You can try to compose custom "_id" from fields "field1" and "field2". Create a function that given a field and array return a string compose of these two and assign as "_id". If there are duplicates will throw an excepction. – gasparms Feb 10 '15 at 13:46