0

I'm trying to adapt the answer to my previous question (Difference between dates in many columns in R). I've realised I only want the time difference between a given column, and the column immediately to it's left. Example for clarification:

df <- data.frame(
  Group=c("A","B"),
  ID=c(1,2),
  Date1=as.POSIXct(c('2016-04-25 09:15:29','2016-04-25 09:15:29')),
  Date2=as.POSIXct(c('2016-04-25 14:01:19','2016-04-25 14:01:19')),
  Date3=as.POSIXct(c('2016-04-26 13:28:19','2016-04-26 13:28:19')),
  stringsAsFactors=F
)

My desired output is Date2-Date1 and Date3-Date2. And this of course would extend for many columns i.e. Date4-Date3 etc. But I do not need Date3-Date1. To clarify, how can I automate this for many columns

df$Date2_Date1 <- difftime(df$Date2,df$Date1, units = c("hours"))
df$Date3_Date2 <- difftime(df$Date3,df$Date2, units = c("hours"))

Thanks to @bgoldst for the original answer. I think I just need to adapt cmb below to have the correct sequence:

cmb <- combn(seq_len(ncol(df)-1L)+1L,2L);
res <- abs(apply(cmb,2L,function(x) difftime(df[[x[1L]]],df[[x[2L]]],units='hours')));
colnames(res) <- apply(cmb,2L,function(x,cns) paste0(cns[x[1L]],'_',cns[x[2L]]),names(df))

Thanks

Community
  • 1
  • 1
Pete900
  • 2,016
  • 1
  • 21
  • 44

2 Answers2

2

Given your example, this should to the trick:

df <- data.frame(
  Group=c("A","B"),
  ID=c(1,2),
  Date1=as.POSIXct(c('2016-04-25 09:15:29','2016-04-25 09:15:29')),
  Date2=as.POSIXct(c('2016-04-25 14:01:19','2016-04-25 14:01:19')),
  Date3=as.POSIXct(c('2016-04-26 13:28:19','2016-04-26 13:28:19')),
  stringsAsFactors=F
)
mapply(difftime, df[, 4:5], df[, 3:4], units = "hours")

> Date2 Date3
> [1,] 4.763889 23.45
> [2,] 4.763889 23.45

In my call mapply applies function difftime to the two arrays provided, so it starts with df[, 4] - df[, 3], then df[, 5] - df[, 4]. You of course have to change this with the column numbers for your dates, and make sure they are ordered in the right way.

Good luck!

Jasper
  • 555
  • 2
  • 12
  • You can find the appropriate column with `which` and then calculate adjacent column by simply adding/subtracting up/down the data.frame. – Roman Luštrik May 20 '16 at 08:58
  • Thanks Jasper, this worked very nicely. Roman, would you mind clarifying with a little bit of example code so I can try. – Pete900 May 20 '16 at 09:07
1

You could use Non-Standard Evaluation:

  1. First you create a character vector with the name of the columns containing the dates. So let' say all the columns starting with 'Date'

    dates = names(df)[grepl("^Date", names(df))]
    
  2. We create a list of formulas that dynamically calculate the difference between to adjacent columns:

    all_operations = lapply(seq_len(length(dates) - 1), function(i){
        as.formula(paste("~difftime(", dates[i + 1], ",", dates[i],", units = c('hours'))"))
    })
    

    this will create the formulas:

    [[1]]:  ~difftime(Date2, Date1, units = c("hours"))
    [[2]]:  ~difftime(Date3, Date2, units = c("hours"))
    
  3. Then you can use dplyr's NSE mutate_ to apply the dynamic formulas generated above:

    df %>%
       mutate_(.dots = setNames(all_operations, paste0("Diff", seq_len(length(dates) - 1))))
    
Lorenzo Rossi
  • 1,481
  • 1
  • 9
  • 16
  • Thanks Lorenzo. I wish I could accept two answers. This also worked very nicely! The only reason I accepted Jaspers is because I understood it more which is perhaps not a good reason. – Pete900 May 20 '16 at 09:13