17

I'm writing a query that requires a $lookup between two tables and as I understand it, it's essential that the foreignField have an index in order to perform this join in a timely fashion. However, even after adding an index on the field, the query is still falling back to COLLSCAN.

db.users.aggregate([
  {$lookup:{ from: "transactions", localField: '_id', foreignField: 'uid', as: 'transaction' }},
  { $match: { transaction: { "$size" : 0} } },
  { $count: "total"},
], { explain: true })

This returns:

"queryPlanner" : {
    "plannerVersion" : 1,
    "namespace" : "test.users",
    "indexFilterSet" : false,
    "parsedQuery" : {

    },
    "winningPlan" : {
        "stage" : "COLLSCAN",
        "direction" : "forward"
    },
    "rejectedPlans" : [ ]
}

As I mentioned, I do have the uid field indexed in the transactions collection:

> db.transactions.getIndexes()
[
    {
        "v" : 1,
        "key" : {
            "_id" : 1
        },
        "name" : "_id_",
        "ns" : "test.transactions"
    },
    {
        "v" : 1,
        "key" : {
            "uid" : 1
        },
        "name" : "uid_1",
        "ns" : "test.transactions"
    }
]

The query takes a few minutes to run in a DB of approximately 7M documents. I'm using MongoDB v3.4.7. Any idea as to what I could be doing wrong? Thanks in advance!

ashe540
  • 2,041
  • 2
  • 17
  • 17
  • 3
    The only stage that can use an index in an aggregation pipeline is the **first**, and that **must be either** `$match` or `$geoNear` or `$sort`. Whilst `$lookup` can likely use an index on the "foreign key", this is not presently reported in in the stats at this time. For what you are attempting here, it is not possible for an index to be used. – Neil Lunn Aug 29 '17 at 12:46
  • 4
    As a side note: using relational data models on MongoDB and trying to make it more SQLish is either a sign of bad data modeling or a poor technology decision. Every time I saw `$lookup` in an aggregation described as a query, it was possible to get rid of it with proper data modeling. – Markus W Mahlberg Aug 30 '17 at 08:42
  • Thanks for the info! I was unaware of the requirement of the use of an index in the lookup stage. – ashe540 Sep 03 '17 at 20:31
  • 7
    @NeilLunn the rules have changed for 3.6, it seems any stage can now use indexes https://docs.mongodb.com/manual/core/aggregation-pipeline/#aggregation-pipeline-operators-and-performance – PirateApp Feb 27 '18 at 14:20
  • could you add sample documents of both the collections? – Vijay Rajpurohit Aug 29 '19 at 11:56
  • @Mahlberg - I cannot agree with you. In your logic, Mongo shouldn't introduce $lookup aggregate at all. Data are more or less related. I find $lookup is very useful in some cases because I don't want to replicate data so much just for a rarely used case. Please don't say "bad data modeling or a poor technology decision" because you don't know the actual business requirement. – newman Jan 25 '22 at 04:49

2 Answers2

5

The "stage" : "COLLSCAN", isn't referring to the $lookup at all.

The first step in that aggregation pipeline is to fetch all of the documents from the 'users' collection. Since there are no filters at all provided for that, collection scan is the most efficient method.

The $lookup stage should be planned like any other query, and would likely use the index.

Joe
  • 25,000
  • 3
  • 22
  • 44
2

Because your aggregation pipeline first stage has no $match or $sort or $geoNear query on indexed key and In $match stage you didn't query on any index key.

Case 1: If you do $match on indexed key in first stage, WinningPlan stage will be "FETCH" and stage of inputStage will be "IXSCAN"

"winningPlan" : {
    "stage" : "FETCH",
    "inputStage" : {
            "stage" : "IXSCAN",
        ...
    }
}

Case 2: If you do $match on non-indexed key in first stage, WinningPlan stage will be "COLLSCAN"

"winningPlan" : {
    "stage" : "COLLSCAN"
}

Case 3: If You do $match on index key after lookup (according to your query), WinningPlan stage will be "FETCH" and inputStage will be "IXSCAN".

Case 4: If You do $match on non-index key after lookup (just you did), WinningPlan stage will be "COLLSCAN".

For 7M record, you must use index in your queries. Don't do too much indexing, because they will stored in the RAM and you can't use $ne or $nin on indexed key properly.

Mongodb Docs: Optimizing Aggregation Pipeline

Mongodb Docs: Indexing Strategies

Ashutosh
  • 1,029
  • 10
  • 23