ddply summarize data hourly

Question

I would like to summarize frequency of a dataset hourly and two-hourly. The time column's format is hh:mm:ss.
The below code is working to summarize data monthly but I have not found any similar code for hourly or two-hourly. Thanks in advance.

data2$StartDate <- as.Date(data2$StartDate, "%m/%d/%Y") data4 <- ddply(data2, .(format(StartDate, "%m")), summarize, freq=length(StartDate))

The dataset is like this:

    TripId  StartDate   StartTime
 <int>     <date> <S3: times>
1 15335543 2016-01-01    00:14:00
2 15335544 2016-01-01    00:14:00
3 15335607 2016-01-01    02:00:00
4 15335608 2016-01-01    02:01:00
5 15335613 2016-01-01    02:16:00
6 15335639 2016-01-01    02:50:00

Please share your data. [How to make a great R reproducible example?](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) — Tung, Mar 30 '18 at 04:58
@Tung data is like this: TripId StartDate StartTime 15335543 1/1/2016 12:14:00 AM 15335544 1/1/2016 12:14:00 AM 15335607 1/1/2016 2:00:00 AM 15335608 1/1/2016 2:01:00 AM 15335613 1/1/2016 2:16:00 AM 15335639 1/1/2016 2:50:00 AM 15335651 1/1/2016 3:08:00 AM 15335962 1/1/2016 8:08:00 AM 15336135 1/1/2016 9:34:00 AM 15336141 1/1/2016 9:37:00 AM 15336164 1/1/2016 9:46:00 AM 15336247 1/1/2016 10:18:00 AM 15336283 1/1/2016 10:39:00 AM 15336330 1/1/2016 10:59:00 AM 15336333 1/1/2016 11:00:00 AM 15336338 1/1/2016 11:04:00 AM 15336362 1/1/2016 11:17:00 AM — Sad Vaseb, Mar 30 '18 at 05:13
@SadVaseb, can you edit your post and include this data set? — hpesoj626, Mar 30 '18 at 05:19

score 2 · Accepted Answer · answered Mar 30 '18 at 11:48

If I understood the question correctly then

For hourly frequency:

library(dplyr)

df %>%
  mutate(start_timestamp = as.POSIXct(paste(df$StartDate, df$StartTime), tz="UTC", format="%Y-%m-%d %H")) %>%
  right_join(data.frame(seq_h = as.POSIXct(unlist(lapply(unique(df$StartDate), 
                                                         function(x) seq(from=as.POSIXct(paste(x, "00:00:00"), tz="UTC"),
                                                                         to=as.POSIXct(paste(x, "23:00:00"), tz="UTC"),
                                                                         by="hour"))), origin="1970-01-01", tz="UTC")), by=c("start_timestamp" = "seq_h")) %>%
  group_by(start_timestamp) %>%
  summarise(freq=sum(!is.na(TripId)))

Output is:

   start_timestamp      freq
 1 2016-01-01 00:00:00     2
 2 2016-01-01 01:00:00     1
 3 2016-01-01 02:00:00     1
 4 2016-01-01 03:00:00     0
 5 2016-01-01 04:00:00     0
...

For two-hourly frequency:

library(dplyr)

df %>%
  mutate(start_timestamp = as.POSIXct(cut(as.POSIXct(paste(df$StartDate, df$StartTime), tz="UTC"), breaks="2 hours"), tz="UTC")) %>%
  right_join(data.frame(seq_h = as.POSIXct(unlist(lapply(unique(df$StartDate), 
                                                         function(x) seq(from=as.POSIXct(paste(x, "00:00:00"), tz="UTC"),
                                                                         to=as.POSIXct(paste(x, "23:00:00"), tz="UTC"),
                                                                         by="2 hours"))), origin="1970-01-01", tz="UTC")), by=c("start_timestamp" = "seq_h")) %>%
  group_by(start_timestamp) %>%
  summarise(freq=sum(!is.na(TripId)))

Output is:

   start_timestamp      freq
 1 2016-01-01 00:00:00     3
 2 2016-01-01 02:00:00     1
 3 2016-01-01 04:00:00     0
 4 2016-01-01 06:00:00     0
 5 2016-01-01 08:00:00     0
...

Sample data:

df <- structure(list(TripId = c(15335543L, 15335544L, 15335607L, 15335608L, 
15335613L, 15335639L), StartDate = c("2016-01-01", "2016-01-01", 
"2016-01-01", "2016-01-01", "2016-01-02", "2016-01-02"), StartTime = c("00:14:00", 
"00:14:00", "01:00:00", "02:01:00", "02:16:00", "02:50:00")), .Names = c("TripId", 
"StartDate", "StartTime"), class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6"))

ddply summarize data hourly

1 Answers1