12

I am trying to calculate the frequency of bikes that are taken by people using a dataset provided by Leada.

Here is the code:

library(dplyr)

setAs("character", "POSIXlt", function(from) strptime(from, format = "%m/%d/%y %H:%M"))
d <- read.csv("http://mandrillapp.com/track/click/30315607/s3-us-west-1.amazonaws.com?p=eyJzIjoiemxlVjNUREczQ2l5UFVPeEFCalNUdmlDYTgwIiwidiI6MSwicCI6IntcInVcIjozMDMxNTYwNyxcInZcIjoxLFwidXJsXCI6XCJodHRwczpcXFwvXFxcL3MzLXVzLXdlc3QtMS5hbWF6b25hd3MuY29tXFxcL2RhdGF5ZWFyXFxcL2Jpa2VfdHJpcF9kYXRhLmNzdlwiLFwiaWRcIjpcImEyODNiNjMzOWJkOTQxMGM5ZjlkYzE0MmQ0NDQ5YmU4XCIsXCJ1cmxfaWRzXCI6W1wiMTVlYzMzNWM1NDRlMTM1ZDI0YjAwODE4ZjI5YTdkMmFkZjU2NWQ2MVwiXX0ifQ",
              colClasses = c("numeric", "numeric", "POSIXlt", "factor", "numeric", "POSIXlt", "factor", "numeric", "numeric", "factor", "character"),
              stringsAsFactors = T)
names(d)[9] <- "BikeNo"

d <- tbl_df(d)

d <- d %>% mutate(Weekday = factor(weekdays(Start.Date)))
d %>% group_by(Weekday) 
  %>% summarise(Total = n()) 
  %>% select(Weekday, Total)

It is strange but dplyr does not want to group data by Weekday saying:

Error: column 'Start.Date' has unsupported type

Why it cares about Start.Date column where I group by a factor? You can run the code locally to reproduce the error: it will download the data automatically.

P.S. I am using dplyr version: dplyr_0.3.0.2

Hugh
  • 15,521
  • 12
  • 57
  • 100
Sergei
  • 1,617
  • 15
  • 31
  • 3
    Your problem may be related to that `dplyr` does not support variables of class `POSIXlt`. See e.g. [**here**](http://stackoverflow.com/questions/27641129/dplyr-mutate-each-colswise-coercion-to-posixlt-fails) and [**here**](http://stackoverflow.com/questions/27641129/dplyr-mutate-each-colswise-coercion-to-posixlt-fails). Can you work with `POSIXct` instead? – Henrik Jan 07 '15 at 21:29
  • Yes, I can. And you are right, if I convert Start.Date and End.Date to POSIXct dplyr works fine! Wow, that is a strange behavior to me. Do you know a way to force strptime to create POSIXct instead of POSIXlt? I don't want to have 2 separate lines converting the data after reading the file. – Sergei Jan 07 '15 at 21:40
  • 1
    Use `as.POSIXct` instead of `strptime`. On the other hand, maybe you can leave them as `character` as well. At least it seems to me that the code you show isn't really depending on `POSIX`. – Henrik Jan 07 '15 at 21:43
  • Strangely enough if I replace strptime to as.POSIXct I get: Error in as.POSIXlt.character(x, tz, ...) : character string is not in a standard unambiguous format – Sergei Jan 07 '15 at 21:46
  • But if I parse the date as POSIXlt and then convert it to POSIXct there is no problem – Sergei Jan 07 '15 at 21:48
  • 1
    Please start by reading `?as.POSIXct`, especially the `format` argument. – Henrik Jan 07 '15 at 21:48
  • 1
    @Sergei, I think you have enough here to answer your own question now ... – Ben Bolker Jan 07 '15 at 22:08
  • @BenBolker, I wish I could. The funny thing, that I still don't get it. Why if I use the following setAs function: as.POSIXct(strptime(from, format = "%m/%d/%y %H:%M")) I get an error, but if I read the data as a character and then use the same function it does not gives me an error. it seems there is something weird in setAs itself. – Sergei Jan 08 '15 at 15:26

2 Answers2

7

The lubridate package is useful when dealing with dates. Here is the code to parse Start.Date and End.Date, extract week days, then group by week days:

Read dates as character vectors

library(dplyr)
library(lubridate)
# For some reason your instruction to load the csv directly from a url
# didn't work. I save the csv to a temporary directory.
d <- read.csv("/tmp/bike_trip_data.csv", colClasses = c("numeric", "numeric", "character", "factor", "numeric", "character", "factor", "numeric", "numeric", "factor", "character"), stringsAsFactors = T)

names(d)[9] <- "BikeNo"
d <- tbl_df(d)

Use lubridate to convert start date and end date

d <- d %>% 
  mutate(
    Start.Date = parse_date_time(Start.Date,"%m/%d/%y %H:%M"),
    End.Date = parse_date_time(End.Date,"%m/%d/%y %H:%M"),
    Weekday = wday(Start.Date, label=TRUE, abbr=FALSE))

Number of lines per week day

d %>%
  group_by(Weekday) %>%
  summarise(Total = n())

#     Weekday Total
# 1    Sunday 10587
# 2    Monday 23138
# 3   Tuesday 24678
# 4 Wednesday 23651
# 5  Thursday 25265
# 6    Friday 24283
# 7  Saturday 12413
Paul Rougieux
  • 10,289
  • 4
  • 68
  • 110
2

I am sorry if this issue is long forgotten, but it weirds me out to see everyone recommending to convert to POSIX.ct or character when I have been using the much simpler solution of calling the arrange function from the plyr package using plyr::arrange, as it doesn't seem to have issues with the POSIXlt formats. As I am usually not the one finding the easiest solution for a problem in R, I am starting to think that there is something wrong with it. Does it not do the same as the dplyr version?