2

In mongodb, I have a collection of people with the schema below. I need to write an aggregation to find possible duplicates in the database by checking:

  • If another person with same firstName, lastName & currentCompany exists.
  • Or, if another person with the same currentCompany & currentTitle exists.
  • Or, if another person has the same email (which is stored as an object in an array)
  • Or, if someone else has the same linkedIn/twitter url.

Is there a straightforward way of checking for duplicates based on the above cases w/ a mongodb aggregation? This question is close to what I'm looking for, but I need to check more than just one key/value.

{ _id: 'wHwNNKMSL9v3gKEuz',
  firstName: 'John',
  lastName: 'Doe',
  currentCompany: 'John Co',
  currentTitle: 'VP Sanitation',
  emails: 
   [ { address: 'Anais.Grant@hotmail.com',
       valid: true } ],
  urls: 
   { linkedIn: 'http://linkedin.com/johnDoe',
     twitter: 'http://twitter.com/@john', 
   } 
}

Thanks!

Community
  • 1
  • 1
Wade Warren
  • 43
  • 1
  • 4

1 Answers1

0

We can achieve it is using the following $and, $or, $ne.

Note:- You need to feed one record as input for the conditions to match it with other records for eliminating the duplicates

I have given a sample query which will be filtering your collection for these two criterias, you can add the rest of your conditions to get the final result

  • If another person with same firstName, lastName & currentCompany exists.
  • Or, if someone else has the same linkedIn/twitter url.
db.yourcollection.find({
  $and: [{
    $or: [{
      firstName: {
        $ne: 'John'
      }
    }, {
      lastName: {
        $ne: 'Doe'
      }
    }, {
      currentCompany: {
        $ne: 'John Co'
      }
    }]
  }, {
    $or: [{
      "urls.linkedIn": {
        $ne: 'http://linkedin.com/Doe'
      }
    }]
  }]
})
Matt Evans
  • 7,113
  • 7
  • 32
  • 64
Clement Amarnath
  • 5,301
  • 1
  • 21
  • 34
  • I've thought about doing it this way- but my db has grown to thousands of records and checking each one manually in Node is super CPU intensive. Do you know if a Mongo aggregation could do this more efficiently? – Wade Warren Nov 25 '15 at 09:10
  • Aggregation framework is faster. More info on http://stackoverflow.com/questions/13908438/is-mongodb-aggregation-framework-faster-than-map-reduce – Clement Amarnath Nov 25 '15 at 13:41
  • Gotcha- so I'm still quite unsure on how I would implement the above w/ an aggregation. – Wade Warren Nov 25 '15 at 19:28