MongoDB -- Find duplicate documents by multiple keys

Question

I have a collection with documents that look like the following:

{
        "_id" : ObjectId("55b377cb66b393427367c3e2"),
        "comment" : "This is a comment",
        "url_key" : "55b377cb66b393427367c3df", //This is an ObjectId from another record in a different collection
}

I need to find records in this collection that contain duplicate values for the both the comment AND the url_key.

I can easily generate (using aggregate) duplicate records for the same, single, key (eg: comment), but I can't figure out how to group by/aggregate for multiple keys.

Here's my current aggregation pipeline:

db.comments.aggregate([ { $group: { _id: { comment: "$comment" }, uniqueIds: { $addToSet: "$_id" }, count: { $sum: 1 } } }, { $match: { count: { $gte: 2 } } }, { $sort: { count : -1} }, {$limit 10 } ]);

Possible duplicate of [Find all duplicate documents in a MongoDB collection by a key field](http://stackoverflow.com/questions/9491920/find-all-duplicate-documents-in-a-mongodb-collection-by-a-key-field) — DhruvPathak, Sep 14 '16 at 12:54

score 8 · Accepted Answer · answered Sep 14 '16 at 12:53

8

Is it as simple as grouping by multiple keys or did I misunderstand your question?

...
{ $group: { _id: { id: "$_id", comment: "$comment" }, count: { $sum: 1 } } },
{ $match: { count: { $gte: 2 } } },
...

answered Sep 14 '16 at 12:53

DAXaholic

33,312
6
76
74

1

Yikes. I thought I tried this -- you're 100% correct. You can simply add more keys to _id. Thank you so much. – gleb1783 Sep 14 '16 at 13:07
Nice to hear that it helped :) – DAXaholic Sep 14 '16 at 13:08
Shouldn't it be `{ count: { $gte: 1} }` ? Means it has found more than 1 documents with same fields, which are considered as duplicates. – Sepehr GH Dec 10 '19 at 07:34
@SepGH $gte -> greater than or equal – DAXaholic Dec 10 '19 at 08:18
Oops... my bad :D – Sepehr GH Dec 10 '19 at 08:27

MongoDB -- Find duplicate documents by multiple keys

1 Answers1

Linked