45

I have two MongoDB collections that share a common _id. Using the mongo shell, I want to find all documents in one collection that do not have a matching _id in the other collection.

Example:

> db.Test.insert({ "_id" : ObjectId("4f08a75f306b428fb9d8bb2e"), "foo" : 1 })
> db.Test.insert({ "_id" : ObjectId("4f08a766306b428fb9d8bb2f"), "foo" : 2 })
> db.Test.insert({ "_id" : ObjectId("4f08a767306b428fb9d8bb30"), "foo" : 3 })
> db.Test.insert({ "_id" : ObjectId("4f08a769306b428fb9d8bb31"), "foo" : 4 })
> db.Test.find()
{ "_id" : ObjectId("4f08a75f306b428fb9d8bb2e"), "foo" : 1 }
{ "_id" : ObjectId("4f08a766306b428fb9d8bb2f"), "foo" : 2 }
{ "_id" : ObjectId("4f08a767306b428fb9d8bb30"), "foo" : 3 }
{ "_id" : ObjectId("4f08a769306b428fb9d8bb31"), "foo" : 4 }
> db.Test2.insert({ "_id" : ObjectId("4f08a75f306b428fb9d8bb2e"), "bar" : 1 });
> db.Test2.insert({ "_id" : ObjectId("4f08a766306b428fb9d8bb2f"), "bar" : 2 });
> db.Test2.find()
{ "_id" : ObjectId("4f08a75f306b428fb9d8bb2e"), "bar" : 1 }
{ "_id" : ObjectId("4f08a766306b428fb9d8bb2f"), "bar" : 2 }

Now I want some query or queries that returns the two documents in Test where the _id's do not match any document in Test2:

{ "_id" : ObjectId("4f08a767306b428fb9d8bb30"), "foo" : 3 }
{ "_id" : ObjectId("4f08a769306b428fb9d8bb31"), "foo" : 4 }

I've tried various combinations of $not, $ne, $or, $in but just can't get the right combination and syntax. Also, I don't mind if db.Test2.find({}, {"_id": 1}) is executed first, saved to some variable, which is then used in a second query (though I can't get that to work either).

Update: Zachary's answer pointing to the $nin answered the key part of the question. For example, this works:

> db.Test.find({"_id": {"$nin": [ObjectId("4f08a75f306b428fb9d8bb2e"), ObjectId("4f08a766306b428fb9d8bb2f")]}})
{ "_id" : ObjectId("4f08a767306b428fb9d8bb30"), "foo" : 3 }
{ "_id" : ObjectId("4f08a769306b428fb9d8bb31"), "foo" : 4 }

But (and acknowledging this is not scalable but trying to it anyway because its not an issue in this situation) I still can't combine the two queries together in the shell. This is the closest I can get, which is obviously less than ideal:

vals = db.Test2.find({}, {"_id": 1}).toArray()
db.Test.find({"_id": {"$nin": [ObjectId(vals[0]._id), ObjectId(vals[1]._id)]}})

Is there a way to return just the values in the find command so that vals can be used directly as the array input to $nin?

Raman
  • 17,606
  • 5
  • 95
  • 112

4 Answers4

86

In mongo 3.2 the following code seems to work

db.collectionb.aggregate([
  {
    $lookup: {
        from: "collectiona",
        localField: "collectionb_fk",
        foreignField: "collectiona_fk",
        as: "matched_docs"
      }
   },
   {
      $match: {
        "matched_docs": { $eq: [] }
      }
   }
]);

based on this https://docs.mongodb.com/manual/reference/operator/aggregation/lookup/#use-lookup-with-an-array example

27px
  • 430
  • 5
  • 16
Nikos Tsagkas
  • 1,287
  • 2
  • 17
  • 31
36

Answering your follow-up. I'd use map().

Given this:

> b1 = {i: 1}
> db.b.save(b1)
> db.b.save({i: 2})
> db.a.save({_id: b1._id})

All you need is:

> vals = db.a.find({}, {id: 1}).map(function(a){return a._id;})
> db.b.find({_id: {$nin: vals}})

which returns

{ "_id" : ObjectId("4f08c60d6b5e49fa3f6b46c1"), "i" : 2 }
  • Ahh, `map`, sweet! That works perfectly. Wish I could accept both Zachary's answer and yours. – Raman Jan 08 '12 at 03:23
  • BTW, are all the possible cursor methods documented somewhere? I didn't see the `map` function mentioned at http://www.mongodb.org/display/DOCS/Queries+and+Cursors, nor at http://www.mongodb.org/display/DOCS/Advanced+Queries#AdvancedQueries-CursorMethods. – Raman Jan 08 '12 at 03:26
  • 2
    `map` is just a good, old-fashioned javascript standard library function that runs on the array pulled out of Mongo. The Mongo shell supports arbitrary JS. –  Jan 08 '12 at 05:13
  • The result of the find command is a cursor, not an array. So is the shell implicitly converting the cursor to an array when a javascript function such as map is called? Otherwise I would have expected to have to do `find(...).toArray().map(...)`. – Raman Jan 09 '12 at 01:02
  • 1
    You're right, looks like it's just defined explicitly: https://github.com/mongodb/mongo/blob/master/src/mongo/shell/query.js. Also, check out `for (var key in cursor){print(key);}` (how I found it). –  Jan 09 '12 at 01:24
17

You will have to save the _ids from collection A to not pull them again from collection B, but you can do it using $nin. See Advanced Queries for all of the MongoDB operators.

Your end query, using the example you gave would look something like:

db.Test.find({"_id": {"$nin": [ObjectId("4f08a75f306b428fb9d8bb2e"), 
 ObjectId("4f08a766306b428fb9d8bb2f")]}})`

Note that this approach won't scale. If you need a solution that scales, you should be setting a flag in collections A and B indicating if the _id is in the other collection and then query off of that instead.

Updated for second part:

The second part is impossible. MongoDB does not support joins or any sort of cross querying between collections in a single query. Querying from one collection, saving the results and then querying from the second is your only choice unless you embed the data in the rows themselves as I mention earlier.

Derlin
  • 9,572
  • 2
  • 32
  • 53
Zachary Anker
  • 4,500
  • 21
  • 19
  • 1
    Minor complaint: the concept is right, but you got the Test and Test2 backwards in the answer. "Now I want some query or queries that returns the two documents in Test where the _id's do not match any document in Test2" – Eve Freeman Jan 07 '12 at 21:05
  • `> db.Test.find({"_id" : {"$nin": [ObjectId("4f08a75f306b428fb9d8bb2e"), ObjectId("4f08a766306b428fb9d8bb2f")]}});` gives `{ "_id" : ObjectId("4f08a767306b428fb9d8bb30"), "foo" : 3 } { "_id" : ObjectId("4f08a769306b428fb9d8bb31"), "foo" : 4 }` – Eve Freeman Jan 07 '12 at 21:08
  • Thanks, that answered the key part of the question, but this is not very useful without also answering the second part. I updated the question to reflect. – Raman Jan 07 '12 at 21:27
  • 1
    from mongo 3.2 you can use $lookup in aggregation framework and use data from other collections – profesor79 Mar 01 '16 at 16:22
2

I've made a script, marking all documents on the second collection that appears in first collection. Then processed the second collection documents.

var first = db.firstCollection.aggregate([ {'$unwind':'$secondCollectionField'} ])

while (first.hasNext()){ var doc = first.next(); db.secondCollection.update( {_id:doc.secondCollectionField} ,{$set:{firstCollectionField:doc._id}} ); }

...process the second collection that has no mark

db.secondCollection.find({"firstCollectionField":{$exists:false}})
pablo.vix
  • 2,103
  • 2
  • 15
  • 12