0

I have the following data:

        datePickup        dateAccepted
1  2015-06-30 14:30:28 2015-06-30 14:32:14
3  2015-07-03 21:25:14 2015-07-03 21:28:50
5  2015-07-03 12:27:30 2015-07-03 12:29:53

and would like to aggregate and average the time difference for each day:

        date    averageTimeDifferenceInSeconds
1  2015-06-30   106
3  2015-07-03   179.5

I have tried the following as shown on this question calculating time difference in R:

dates <- strptime( paste(df_timestamps[,0], df_timestamps[,1]), "%Y-%m-%d %H:%M:%S")
dates <- as.numeric(difftime(strptime(paste(dates[,1],"%Y-%m-%d %H:%M:%S"),strptime(paste(dates[,2]),"%Y-%m-%d %H:%M:%S"))))

But am getting the error:

    Error in lapply(X = x, FUN = "[", ..., drop = drop) : 
  argument is missing, with no default
Community
  • 1
  • 1
verbati
  • 13
  • 3

3 Answers3

1

So here's a data.table solution.

library(data.table)
setDT(df)[,list(Diff=mean(difftime(dateAccepted,datePickup,units="sec"))),by=as.Date(datePickup)]
#          date  Diff
# 1: 2015-06-30 106.0
# 2: 2015-07-03 179.5

Unpacking this:

  • setDT(df) converts your df to a data.table in-situ (without making a copy, so it's very fast), and
  • [,list(Diff=mean(difftime(dateAccepted,datePickup,units="sec"))),by=as.Date(datePickup)] groups the result by the Date part of datePickup and calculates the mean time difference in seconds for each group.
jlhoward
  • 58,004
  • 7
  • 97
  • 140
0
library(xts)
library(highfrequency)

x<-read.table(text='datePickup        dateAccepted
  "2015-06-30 14:30:28" "2015-06-30 14:32:14"
  "2015-07-03 21:25:14" "2015-07-03 21:28:50"
  "2015-07-03 12:27:30" "2015-07-03 12:29:53"',header=T)

x<-apply(x,2,as.POSIXlt,format="%Y-%m-%d %H:%M:%S",tz="GMT")

tsx<-xts(as.vector(difftime(x$dateAccepted,x$datePickup,units = "secs")),order.by = as.Date(x$datePickup))
atsx<-aggregatets(tsx,on = "days",FUN = "mean",k = 1,dropna = T)

df<-data.frame(index(atsx),as.vector(atsx))
colnames(df)<-c("date","averageTimeDifferenceInSeconds")
df

        date averageTimeDifferenceInSeconds
1 2015-06-30                          106.0
2 2015-07-03                          179.5
vck
  • 827
  • 5
  • 10
0

The same result as vck's. But without "xts" and "highfrequency":

df <- data.frame( datePickup   = strptime( c( "2015-06-30 14:30:28",
                                              "2015-07-03 21:25:14",
                                              "2015-07-03 12:27:30"  ), format="%Y-%m-%d %H:%M:%S" ),
                  dateAccepted = strptime( c( "2015-06-30 14:32:14",
                                              "2015-07-03 21:28:50",
                                              "2015-07-03 12:29:53"  ), format="%Y-%m-%d %H:%M:%S" )  )

dt <- difftime( df$dateAccepted, df$datePickup, units="secs")
date <- as.Date(df$datePickup)

avg <- data.frame( date                           = unique(date),
                   averageTimeDifferenceInSeconds = sapply( unique(date), function(d){mean(dt[which(date==d)])}) )

.

> avg
        date averageTimeDifferenceInSeconds
1 2015-06-30                          106.0
2 2015-07-03                          179.5
> 
mra68
  • 2,960
  • 1
  • 10
  • 17
  • thanks for the answer! If I were to iterate over an entire data frame how would I execute the first command in your solution? (that is, instead of manually inputting the dates and times) – verbati Sep 01 '15 at 11:36
  • Managed to do it with a simple subset. – verbati Sep 01 '15 at 11:50