1

I found a answer on SO, but that doesn't works for me. What I want is as follows:

c_id    c_time
1    2012-08-15 00:00:30
1    2012-08-15 00:01:21
1    2012-08-15 00:01:25
2    2012-08-15 00:02:40
2    2012-08-15 00:03:41

I would like to get a time difference between first and last appearance of a c_id variable in hours, minutes or seconds.

c_diff
00:00:55 
00:01:01

Any help is highly appreciated!

Community
  • 1
  • 1
Makaroni
  • 880
  • 3
  • 15
  • 34

3 Answers3

4

This is a simple matter of grouping the times by the different c_id and taking the difference between the max and min. There are several ways of doing this, I prefer using the group_by function for the dplyr package.

#sample data    
#dput(df)
    df<-df<-structure(list(c_id = c(1, 1, 1, 2, 2), c_time = structure(c(1344988830, 
        1344988881, 1344988885, 1344988960, 1344989021), class = c("POSIXct", 
    "POSIXt"), tzone = "GMT")), .Names = c("c_id", "c_time"), row.names = c(NA, 
       -5L), class = "data.frame")


library(dplyr)
out<-summarize(group_by(df, c_id), delta=difftime(max(c_time), min(c_time), units = "mins"))

To convert the column delta to your desired format, you need to use the chron package:

library(chron)
zero <- structure(0, units = "secs", class = "difftime")
dd.day <- as.vector((out$delta + zero)/(24*60*60))
out$delta<-times(dd.day) 

out
# Source: local data frame [2 x 2]
# 
#    c_id    delta
#   (dbl)   (tims)
# 1     1 00:00:55
# 2     2 00:01:01

There should have been an easier way, but I found the answer on a post from 2005 (http://grokbase.com/t/r/r-help/055n2qa21v/r-print-format-for-difftime)

Dave2e
  • 22,192
  • 18
  • 42
  • 50
3

tapply might help.

>c_id <- c(1,1,1,2,2)
>c_time <- as.POSIXct(c("2012-08-15 00:00:30","2012-08-15 00:01:21","2012-08-15 00:01:25","2012-08-15 00:02:40","2012-08-15 00:03:41"))
>c_diff <- tapply(c_time, c_id, function(x) max(as.numeric(x))-min(as.numeric(x)))
>c_diff
 1  2 
55 61 
Shiryu
  • 31
  • 3
2

Getting the difference is easy: just subtract, i.e. max(df$c_time) - min(df$c_time). However, you might find units get wonky when applying this to multiple subsets, so using difftime explicitly so you can set units is a good idea. In aggregate:

aggregate(c_time ~ c_id, df, function(x){difftime(max(x), min(x), units ='secs')})
#  c_id c_time
# 1    1    55 
# 2    2    61 

However, this isn't the time format you wanted. chron has a nice times class, assuming you don't just want to work with strings. It subtracts better than difftime:

aggregate(c_time ~ c_id, df, function(x){x <- chron::as.chron(x); max(x) - min(x)})
#   c_id   c_time
# 1    1 00:00:55
# 2    2 00:01:01

# or in dplyr
library(dplyr)

df %>% mutate(c_time = chron::as.chron(c_time)) %>% 
    group_by(c_id) %>% 
    summarise(range = max(c_time) - min(c_time))
# Source: local data frame [2 x 2]
# 
#    c_id    range
#   (int)   (tims)
# 1     1 00:00:55
# 2     2 00:01:01

# or data.table
library(data.table)

setDT(df)[, .(c_id, c_time = chron::as.chron(c_time))
          ][, .(range = max(c_time) - min(c_time)), by = c_id]
#    c_id    range
# 1:    1 00:00:55
# 2:    2 00:01:01

Or use data.table's ITime instead of chron::times, though it returns seconds when subtracted:

setDT(df)[, .(c_id, c_time = as.ITime(c_time))
          ][, .(range = as.ITime(as.POSIXct(max(c_time) - min(c_time), 
                                            'UTC', origin))), by = c_id]
#    c_id    range
# 1:    1 00:00:55
# 2:    2 00:01:01

If you do in fact want strings instead of a formatted time, you can actually keep it in all base R:

aggregate(c_time ~ c_id, df, 
          function(x){format(as.POSIXct(as.numeric(difftime(max(x), 
                                                            min(x), 
                                                            units ='secs')), 
                                        'UTC', origin = origin), 
                             '%T')})
#   c_id   c_time
# 1    1 00:00:55
# 2    2 00:01:01
alistaire
  • 42,459
  • 4
  • 77
  • 117