12

Following is the aggregation query :

[
  {
    "$match": {
      "UserId": {
        "$in": [
          5
        ]
      },
      "WorkflowStartTime": {
        "$gte": ISODate('2015-04-09T00:00:00.000Z'),
        "$lte": ISODate('2015-04-16T00:00:00.000Z')
      }
    }
  },
  {
    "$group": {
      "_id": {
        "Task": "$TaskId",
        "WorkflowId": "$WorkflowInstanceId"
      },
      "TaskName": {
        "$first": "$Task"
      },
      "StartTime": {
        "$first": "$StartTime"
      },
      "EndTime": {
        "$last": "$EndTime"
      },
      "LastExecutionTime": {
        "$last": "$StartTime"
      },
      "WorkflowName": {
        "$first": "$WorkflowName"
      }
    }
  },
  {
    "$project": {
      "_id": 1,
      "LastExecutionTime": 1,
      "TaskName": 1,
      "AverageExecutionTime": {
        "$subtract": [
          "$EndTime",
          "$StartTime"
        ]
      },
      "WorkflowName": 1
    }
  },
  {
    "$group": {
      "_id": "$_id.Task",
      "LastExecutionTime": {
        "$last": "$LastExecutionTime"
      },
      "AverageExecutionTime": {
        "$avg": "$AverageExecutionTime"
      },
      "TaskName": {
        "$first": "$TaskName"
      },
      "TotalInstanceCount": {
        "$sum": 1
      },
      "WorkflowName": {
        "$first": "$WorkflowName"
      }
    }
  },
  {
    "$project": {
      "Id": "$_id",
      "_id": 0,
      "Name": "$TaskName",
      "LastExecutionDate": {
        "$substr": [
          "$LastExecutionTime",
          0,
          30
        ]
      },
      "AverageExecutionTimeInMilliSeconds": "$AverageExecutionTime",
      "TotalInstanceCount": "$TotalInstanceCount",
      "WorkflowName": 1
    }
  }
]

My collection documents are as follows :

{
        "_id" : ObjectId("550ff07ce4b09bf056df4ac1"),
        "OutputData" : "xyz",
        "InputData" : null,
        "Location" : null,
        "ChannelName" : "XYZ",
        "UserId" : 5,
        "TaskId" : 95,
        "ChannelId" : 5,
        "Status" : "Success",
        "TaskTypeId" : 7,
        "WorkflowId" : 37,
        "Task" : "XYZ",
        "WorkflowStartTime" : ISODate("2015-03-23T05:09:26Z"),
        "EndTime" : ISODate("2015-03-23T05:22:44Z"),
        "StartTime" : ISODate("2015-03-23T05:22:44Z"),
        "TaskType" : "TRIGGER",
        "WorkflowInstanceId" : "23-3-2015-95d17f17-2580-4fe3-b627-12e862af08ce",
        "StackTrace" : null,
        "WorkflowName" : "XYZ data workflow"
}

I have a index on {WorkflowStartTime:1,UserId:1, StartTime:1}

Their are hardly 900000 records in collection, and as it is i am using a subset of data while quering using date range still it taking around 1.5 to 1.7 seconds. I have used aggregation framework with other collections with huge data and the performance is very good. Don't know what is wrong with this query as its showing very slow output, i expect it to be in mills as its a real time analytics query. Any pointer on it appreciated.

Output when {explain : true } added to aggregation query

{
  "stages": [


       {
          "$cursor": {
            "query": {
              "UserId": {
                "$in": [
                  5
                ]
              },
              "WorkflowStartTime": {
                "$gte": "ISODate(2015-04-09T00:00:00Z)",
                "$lte": "ISODate(2015-04-16T00:00:00Z)"
              }
            },
            "fields": {
              "EndTime": 1,
              "StartTime": 1,
              "Task": 1,
              "TaskId": 1,
              "WorkflowInstanceId": 1,
              "WorkflowName": 1,
              "_id": 0
            },
            "plan": {
              "cursor": "BtreeCursor ",
              "isMultiKey": false,
              "scanAndOrder": false,
              "indexBounds": {
                "WorkflowStartTime": [
                  [
                    "ISODate(2015-04-16T00:00:00Z)",
                    "ISODate(2015-04-09T00:00:00Z)"
                  ]
                ],
                "UserId": [
                  [
                    5,
                    5
                  ]
                ]
              },
              "allPlans": [
                {
                  "cursor": "BtreeCursor ",
                  "isMultiKey": false,
                  "scanAndOrder": false,
                  "indexBounds": {
                    "WorkflowStartTime": [
                      [
                        "ISODate(2015-04-16T00:00:00Z)",
                        "ISODate(2015-04-09T00:00:00Z)"
                      ]
                    ],
                    "UserId": [
                      [
                        5,
                        5
                      ]
                    ]
                  }
                }
              ]
            }
          }
        },
        {
          "$group": {
            "_id": {
              "Task": "$TaskId",
              "WorkflowId": "$WorkflowInstanceId"
            },
            "TaskName": {
              "$first": "$Task"
            },
            "StartTime": {
              "$first": "$StartTime"
            },
            "EndTime": {
              "$last": "$EndTime"
            },
            "LastExecutionTime": {
              "$last": "$StartTime"
            },
            "WorkflowName": {
              "$first": "$WorkflowName"
            }
          }
        },
        {
          "$project": {
            "_id": true,
            "LastExecutionTime": true,
            "TaskName": true,
            "AverageExecutionTime": {
              "$subtract": [
                "$EndTime",
                "$StartTime"
              ]
            },
            "WorkflowName": true
          }
        },
        {
          "$group": {
            "_id": "$_id.Task",
            "LastExecutionTime": {
              "$last": "$LastExecutionTime"
            },
            "AverageExecutionTime": {
              "$avg": "$AverageExecutionTime"
            },
            "TaskName": {
              "$first": "$TaskName"
            },
            "TotalInstanceCount": {
              "$sum": {
                "$const": 1
              }
            },
            "WorkflowName": {
              "$first": "$WorkflowName"
            }
          }
        },
        {
          "$project": {
            "_id": false,
            "Id": "$_id",
            "Name": "$TaskName",
            "LastExecutionDate": {
              "$substr": [
                "$LastExecutionTime",
                {
                  "$const": 0
                },
                {
                  "$const": 30
                }
              ]
            },
            "AverageExecutionTimeInMilliSeconds": "$AverageExecutionTime",
            "TotalInstanceCount": "$TotalInstanceCount",
            "WorkflowName": true
          }
        }
      ],
      "ok": 1
    }
Ninad
  • 474
  • 8
  • 26
  • [what does the explain output say about the query](http://docs.mongodb.org/manual/reference/command/aggregate/)? – Philipp Apr 15 '15 at 06:25
  • 1
    @Philipp explain output added, it always showing me this much only in explain output but in docs its something else m i missing something ? – Ninad Apr 15 '15 at 06:39
  • It says, that it's using `BtreeCursor`, that should be fine: *A query that uses an index has a cursor of type BtreeCursor* ( http://openmymind.net/Speedig-Up-Queries-Understanding-Query-Plans/ ). Could you execute your query without aggregation framework and tell us how many matching document it finds? `...` maybe it could also help to reverse the index to `{UserId:1,WorkflowStartTime:1}`, because your `$match` query has first `UserId` and then `WorkflowStartTime`. But could also lead to no benefits at all. – Benjamin M Apr 15 '15 at 09:22
  • @BenjaminM do you mean just add match criteria in find query with explain ? – Ninad Apr 15 '15 at 09:30
  • no, just use `count()` with your query. just want to see how many documents the aggregation will operate on. – Benjamin M Apr 15 '15 at 11:45
  • @BenjaminM sorry i am still not geting what you mean by just use count(), if i am guessing right do you mean use **db.collection.count({"UserId":{"$in":[5]},"WorkflowStartTime":{"$gte":ISODate('2015-04-09T00:00:00.000Z'),"$lte":ISODate('2015-04-16T00:00:00.000Z')}})** ?? – Ninad Apr 15 '15 at 13:29
  • yes right. i just wanted to know how many documents the aggregation will work on. must be several thousand if it takes 2 seconds. or how many docs does it count? – Benjamin M Apr 15 '15 at 15:11

1 Answers1

2

The aggregation don't use any Index. You need create a new Index:

{UserId:1,WorkflowStartTime:1}

If all is good, the agregation + explain must appear this line:

    "winningPlan" :...
  • my query is already using index in match phase, and using workflowstartTime before userId in index is helpful bcz we are doing range query on WorkflowStartTime – Ninad Apr 21 '15 at 04:32
  • Another solution you could change the order of "match", first "WorkflowStartTime" and second "UserId". I beleive that the Index not is using in the aggregation. – Alberto J García Gago Apr 21 '15 at 06:14
  • 2
    In explain output its showing, the value for **cursor** under **plan** section is **Btree** which indicated that the index is used – Ninad Apr 21 '15 at 10:15