2

I have 2 collections like that, being A a 1.5kk documents collection:

A
_id
relation: { alias, ref: 'B/<_id>' }

B
_id
name

I want to query for the elements in A, that aren't in B. This is what I get so far:

db.getCollection('A').aggregate([
    {
        $project: {
            'relation.ref': {
                $arrayElemAt: [
                    { $split: ['$relation.ref', '/'] }, 1
                ]
            },
            a: "$$ROOT"
        }
    },
    {
        $lookup: {
            from: 'B',
            localField: 'relation.ref',
            foreignField: '_id',
            as: 'b'
        }
    }
]);

This does return things in this pattern:

{
  _id,
  relation.ref,
  a: {},
  b: [{}]
}

Well I was expecting to add a $match clause to include only the ones with b being an empty array. Something like ${match: { b: {$size: 0} } }

The problem is that this query seems to take too long. I never saw it working, thus I dont know it is the right way.

Is there anything to make it perform better or a different way to get this?

For example, if I force a wrong join, it will return an empty array on b. But querying the size of this array is always slow, even when I query for {$size: 1}, that is 99,9% of the results. I wonder if i could do something like this to select only the missing values during the join:

{$lookup.pipeline: [{$myself: null}] }

Ideas, please?

Thanks you

Victor Ferreira
  • 6,151
  • 13
  • 64
  • 120

1 Answers1

1

Your $match syntax was close but a little off, you should use something like this:

{
    $match: {
      $expr: {
        $eq: [
          0,
          {
            $size: "$b"
          }
        ]
      }
    }
  }

MongoPlayground

However wouldn't it be easier to update the relation field to only contain valid refs? and if that's the case you can just query 'relation.ref': {$exists: false}

Tom Slabbaert
  • 21,288
  • 10
  • 30
  • 43
  • Hey, tom, where should I use the `'relation.ref': {$exists: false}`? In the projection statement? The `$match` expression, where are you putting it? In the `$lookup.pipeline`? It wont allow me, because it says you can`t pass `foreignField` and `localField` alongside `pipeline`. But `b` only exists after the join – Victor Ferreira Jul 13 '20 at 15:46
  • if I put the expression you sent me in the outer $match, same lavel as `project` and `lookup` it takes too long. after 200 seconds it hangs up. should I change the timeout? – Victor Ferreira Jul 13 '20 at 15:56
  • `$lookup` is a really expensive operation for Mongo and it's usually not recommended on using it. that's why i suggested on the alternative which is updating the `relation` field ahead of time and then you can know which documents are relevant without the `$lookup`. – Tom Slabbaert Jul 13 '20 at 15:57