2

I'm on a project working with data of a bike-sharing service. Each trip has the following info

> db.bikes.find({bike:9990}).pretty()
{
    "_id" : ObjectId("5bb59fd8e9fb374bf0cd5c1c"),
    "gender" : "M",
    "userAge" : 49,
    "bike" : 9990,
    "depStation" : 150,
    "depDate" : "01/08/2018",
    "depHour" : "0:00:13",
    "arrvStation" : 179,
    "arrvDate" : "01/08/2018",
    "arrvHour" : "0:23:38"
}

How do I group for each hour of the day and count the number of trips made in that specific hour? I'm trying with this query

db.bikes.aggregate(
  { 
     $group:{_id:{$hour: "$depHour"}, trips:{$sum: 1}}
  }
)

But it throws this error

    "ok" : 0,
    "errmsg" : "can't convert from BSON type string to Date",
    "code" : 16006,
    "codeName" : "Location16006"
chridam
  • 100,957
  • 23
  • 236
  • 235
jschz
  • 39
  • 9
  • What are the possible values for the field `"depHour"`? What is its general time format, is it "h:mm:ss"? – chridam Oct 04 '18 at 18:19

2 Answers2

2

The depDate and depHour fields are all string values that denote the day and the hour respectively so there is no need to use the date operators to convert the fields to date objects, all you need is to extract the hour part using $substrCP and then use them directly as expressions in your $group _id as:

db.bikes.aggregate([
    { '$group': {
        '_id': {
            'day': '$depDate',
            'hour': { '$substrCP': [ '$depHour', 0, 2 ] }
        }, 
        'trips': { '$sum': 1 }
    } }
])
chridam
  • 100,957
  • 23
  • 236
  • 235
  • Thanks for the help, it worked! Is it easier to handle data that way (day and hour separately) or using ISOdate for example? – jschz Oct 04 '18 at 19:47
  • 1
    A bit opinionated but this [thread](https://stackoverflow.com/questions/6764821/what-is-the-best-way-to-store-dates-in-mongodb) will enlighten you better, personally would go for the separate date and time string formats – chridam Oct 04 '18 at 20:03
  • 1
    Thanks! Also another question, using substring works when the hour value it's between 0 and 9 but since data it's in 24 hour format it doesn't work from 10-23. How can I handle this? Using regex? – jschz Oct 04 '18 at 20:35
  • From your example it doesn't look like 24-hr format (`"0:00:13"`) hence the questions I asked in the comments for the original question. My initial answer was based on the assumption that the hour part is the second token in the given format `0:hh:mm` i.e. the `00` part.. Obviously that was wrong but if it's indeed in the 24-hr format `hh:mm:ss` then `depHour` should read `00:00:13` instead of `0:00:13`. Nonetheless, the updated `{ '$substrCP': [ '$depHour', 0, 2 ] }` will handle the general 24-hr case although the hour part will be displayed as `0:` for 12 am or `9:` for 9 am, for example. – chridam Oct 05 '18 at 07:27
  • 1
    Yeah you're. right, Sorry, I just pasted in the first document the query gave me. I fixed it with split also the dataset has inconsistency handling times, as you said different hour formats(H:MM and HH:MM). Anyway your solution worked out for me. Thank you! – jschz Oct 06 '18 at 16:54
1

You can try below aggregation with mognodb 4.0

db.collection.aggregate([
  { "$group": {
    "_id": {
      "$dateToString": {
        "date": {
          "$toDate": { "$concat": ["$depDate", "T", "$depHour"] }
        },
        "format": "%Y-%m-%d-%H"
      }
    },
    "trips": { "$sum": 1 }
  }}
])
Ashh
  • 44,693
  • 14
  • 105
  • 132
  • I believe this is overkill; there certainly is no need to convert the concatenated string to date object then back again to a string since the date strings already suffice. Apart from that the above won't give the correct results because the `_id` expression `{ "$dateToString: }` will always return `"YYY-MM-DD-00"` as you forget there is always a `0` prefixed to `"$depHour"` – chridam Oct 04 '18 at 08:47
  • From my thinking that is not a prefix it is of format ("00:00:13")HH:MM:SS. So after converting to date it will be easier to `$group` with any desired format we want. – Ashh Oct 04 '18 at 08:55
  • Actually you are right, the hour is the first digit but the format is `"h:mm:ss"`, not `"HH:mm:ss"` as you thought. I would still maintain the conversion can be a bit overkill though since you just need to extract the hour part and the day to get the group, otherwise you are right. – chridam Oct 04 '18 at 09:12
  • 1
    The main part here is the OP now can use `$group` with any format he/she wants. Your aggregation still doesn't work if the hours have two digits in it :-) – Ashh Oct 04 '18 at 16:04