5

I am noticing huge performance differences in what appears to be same aggregate, at least conceptually. The tests were made on a simple collection structure, that has an _id and a name and a createdAt, but there 20 million of those. There is an index on createdAt. It's hosted on an mlab cluster, version is 3.6.9 WiredTiger.

I am trying to get a simple paging going using aggregate, I know I could use find and limit, but I like to add more elements to the pipeline, the example I give is very distilled.

db.getCollection("runnablecalls").aggregate([
         {
          $facet: {
            docs: [
                { $sort:  {createdAt: -1} },
                { $limit:  25 },
                { $skip:  0 },                                
            ],
            page_info: [
              { $group: { _id: null, total: { $sum: 1 } } 
              }
            ],
          }
        }                           
      ])

That takes almost 40s. Now if I moved the $sort and $limit outside of the facet it takes 0.042s.

db.getCollection("runnablecalls").aggregate([
        { $sort:  {createdAt: -1} },
        { $limit:  25 }, 
        {
          $facet: {
            docs: [           
                { $skip:  0 },                
            ],            
            page_info: [
                { 
                  $group: { _id: null, total: { $sum: 1 } } 
                } 
            ]}
        },                           
      ])

The page_info facet makes no difference at the end, I can take it out without difference, I am just leaving it in because I like use it. I know how to solve the problem using two queries a count and an aggregate without a $facet. I just like to understand why this happens.

Stan Wiechers
  • 1,962
  • 27
  • 45

1 Answers1

2

The first aggregation doesn't use an index. The second aggregation uses an index and filters first 25 docs before it enters $facet. You can add explain('executionStats') to see query plans and indexes usages. For example,

db.getCollection("runnablecalls").explain('executionStats').aggregate([
         {
          $facet: {
            docs: [
                { $sort:  {createdAt: -1} },
                { $limit:  25 },
                { $skip:  0 },                                
            ],
            page_info: [
              { $group: { _id: null, total: { $sum: 1 } } 
              }
            ],
          }
        }                           
      ])
simagix
  • 1,832
  • 1
  • 9
  • 11