3

If a collection have a list of dogs, and there is duplicate entries on some races. How do i remove all, but a single specific/non specific one, from just one query?

I guess it would be possible to get all from a Model.find(), loop through every index except the first one and call Model.remove(), but I would rather have the database handle the logic through the query. How would this be possible?

pseudocode example of what i want:

Model.remove({race:"pitbull"}).where(notFirstOne);
Jonas Grønbek
  • 1,709
  • 2
  • 22
  • 49
  • Could you show your data and expected output ? – mickl Jan 21 '19 at 12:21
  • Can you explain more with some real code of what you what you are doing? This will not work to remove all except some Model.remove({race:"pitbull"}).where(notFirstOne) – Gary Jan 21 '19 at 13:51

2 Answers2

1

To remove all but one, you need a way to get all the filtered documents, group them by the identifier, create a list of ids for the group and remove a single id from this list. Armed with this info, you can then run another operation to remove the documents with those ids. Essentially you will be running two queries.

The first query is an aggregate operation that aims to get the list of ids with the potentially nuking documents:

(async () => {
    // Get the duplicate entries minus 1
    const [doc, ...rest] = await Module.aggregate([
        { '$match': { 'race': 'pitbull'} },
        { '$group': {
            '_id': '$race',
            'ids': { '$push': '$_id' },
            'id': { '$first': '$_id' }
        } },
        { '$project': { 'idsToRemove': { '$setDifference': [ ['$id'], '$ids' ] } } }
    ]);

    const { idsToRemove } = doc;

    // Remove the duplicate documents
    Module.remove({ '_id': { '$in': idsToRemove } })
})();
chridam
  • 100,957
  • 23
  • 236
  • 235
0

if purpose is to keep only one, in case of concurrent writes, may as well just write

Module.findOne({race:'pitbull'}).select('_id')
//bla
Module.remove({race:'pitbull', _id:{$ne:idReturned}})

If it is to keep the very first one, mongodb does not guarantee results will be sorted by increasing _id (natural order refers to disk) see Does default find() implicitly sort by _id? so instead

Module.find({race:'pitbull'}).sort({_id:1}).limit(1)
grodzi
  • 5,633
  • 1
  • 15
  • 15