1

part of our data is like this...

    {
        "imei": "LDSJB0005",
        "timestamp": "2017-11-11 14:49:26",
        "trip_id" : 16,
        "data_type": 2,
        "longitude": "-",
        "latitude": "-",
        "loc_wgs84": {
            "type": "Point",
            "coordinates": [
                0,
                0
             ]
        }
    }

There are 3 million entries like this in db.vehicles and the index is like this.

> db.vehicles.getIndexes()
[
    {
            "v" : 2,
            "key" : {
                    "_id" : 1
            },
            "name" : "_id_",
            "ns" : "demo.vehicles"
    },
    {
            "v" : 2,
            "key" : {
                    "imei" : 1,
                    "data_type" : 1,
                    "trip_id" : 1
            },
            "name" : "imei_1_data_type_1_trip_id_1",
            "ns" : "demo.vehicles"
    },
    {
            "v" : 2,
            "key" : {
                    "loc_wgs84" : "2dsphere"
            },
            "name" : "loc_wgs84_2dsphere",
            "ns" : "demo.vehicles",
            "2dsphereIndexVersion" : 3
    }
]

Now, we are doing aggregation query like below. Just trying to get every vehicles' imei which has the largest trip_id value(which means it's the vehicles last ride of the day)

[
    {$match: {'data_type': 1}},
    {$sort: {imei:1, trip_id:-1}},
    {$group: {
        _id: '$imei',
        imei: {$first: '$imei'},
        trip_id: {$first: '$trip_id'}
    }}
]

The query takes like 10 minutes to complete. Help.

felix
  • 9,007
  • 7
  • 41
  • 62

0 Answers0