1

I have data that looks like this:

ID Date1               Date2               Date3
A  2016-04-25 09:15:29 2016-04-25 14:01:19 2016-04-26 13:28:19
B  2016-04-25 09:15:29 2016-04-25 14:01:19 2016-04-26 13:28:19

I want the difference in hours between each date combination (ideally only going forward in time i.e. no negative differences). I know how to do this manually (calculating number of days between 2 columns of dates in data frame):

df$Date2_Date1 <- difftime(df$Date2,df$Date1, units = c("hours"))

However, my real data frame is much bigger and this would be very tedious (but possible). I have read this (Calculate pairwise-difference between each pair of columns in dataframe) and this (R: Compare all the columns pairwise in matrix) which lead me to try this:

nm1 <- outer(colnames(df), colnames(df), paste, sep="_")
indx1 <-  which(lower.tri(nm1, diag=TRUE))
df2 <- outer(1:ncol(df), 1:ncol(df), 
             function(x,y) df[,x]-df[,y])

Which I think is getting me close but my ideal output is this:

ID Date2_Date1 Date3_Date1 Date3_Date2
A  x hours     y hour      ...
B  ..

Are there any nice solutions for this?

Community
  • 1
  • 1
Pete900
  • 2,016
  • 1
  • 21
  • 44

1 Answers1

3

Here's one way, based on combn() and apply():

df <- data.frame(
    ID=c('A','B'),
    Date1=as.POSIXct(c('2016-04-25 09:15:29','2016-04-25 09:15:29')),
    Date2=as.POSIXct(c('2016-04-25 14:01:19','2016-04-25 14:01:19')),
    Date3=as.POSIXct(c('2016-04-26 13:28:19','2016-04-26 13:28:19')),
    stringsAsFactors=F
);

cmb <- combn(seq_len(ncol(df)-1L)+1L,2L);
res <- abs(apply(cmb,2L,function(x) difftime(df[[x[1L]]],df[[x[2L]]],units='hours')));
colnames(res) <- apply(cmb,2L,function(x,cns) paste0(cns[x[1L]],'_',cns[x[2L]]),names(df));
res;
##      Date1_Date2 Date1_Date3 Date2_Date3
## [1,]    4.763889    28.21389       23.45
## [2,]    4.763889    28.21389       23.45
bgoldst
  • 34,190
  • 6
  • 38
  • 64
  • 1
    Thank you that works. I'm having a few problems apply to my real data. I think because I have more ID columns to group by. Does "-1L" in the cmb remove the ID col? – Pete900 May 19 '16 at 10:19
  • 1
    Yes, the `-1L` removes the `ID` column since it must not participate in the combinations computation. Regarding grouping, I assumed you did not need to group by `ID` for this task because your example code does not group by `ID` (referring to your manual solution `df$Date2_Date1 <- difftime(df$Date2,df$Date1, units = c("hours"))`). – bgoldst May 19 '16 at 10:21
  • 1
    Yes you are right I don't need to group by anything. I just need to adapt the code to be "-2L". My fault I should have posted exactly by format type. – Pete900 May 19 '16 at 10:23
  • 1
    Almost. I just get this error: Error in as.POSIXct.numeric(time1) : 'origin' must be supplied, when I run "res". But I think that must be a date conversion issue. I'm sure I'll find in on SO. – Pete900 May 19 '16 at 10:25