1

I have irregular time series data frame with time (seconds) and value columns. I want to add another column, value_2 where values are lead by delay seconds. So value_2 at time t equals to value at time t + delay or right after that.

ts=data.frame(
  time=c(1,2,3,5,8,10,11,15,20,23),
  value=c(1,2,3,4,5,6,7,8,9,10)
)

ts_with_delayed_value <- add_delayed_value(ts, "value", 2, "time")

> ts_with_delayed_value
   time value value_2
1     1     1       3
2     2     2       4
3     3     3       4
4     5     4       5
5     8     5       6
6    10     6       8
7    11     7       8
8    15     8       9
9    20     9      10
10   23    10      10

I have my own version of this function add_delayed_value, here it is:

add_delayed_value <- function(data, colname, delay, colname_time) {
  colname_delayed <- paste(colname, sprintf("%d", delay), sep="_")
  data[colname_delayed] <- NaN

  for (i in 1:nrow(data)) {
    time_delayed <- data[i, colname_time] + delay
    value_delayed <- data[data[colname_time] >= time_delayed, colname][1]
    if (is.na(value_delayed)) {
      value_delayed <- data[i, colname]
    }
    data[i, colname_delayed] <- value_delayed
  }

  return(data)
}

Is there a way to vectorize this routine to avoid the slow loop?

I'm quite new to R, so this code probably has lots of issues. What can be improved about it?

ak.
  • 3,329
  • 3
  • 38
  • 50

4 Answers4

2

You could try:

library(dplyr)
library(zoo)
na.locf(ts$value[sapply(ts$time, function(x) min(which(ts$time - x >=2 )))])
[1]  3  4  4  5  6  8  8  9 10 10
DatamineR
  • 10,428
  • 3
  • 25
  • 45
  • This blows up when `min` takes an empty column. For the latest time entry the following will return an empty column: `which(ts$time - latest_time >= 2)`. How did this work for you? – ak. Apr 20 '16 at 21:09
  • Oh, nevermind - it was just a warning message. It did work. – ak. Apr 20 '16 at 21:46
  • That being said, is it possible to mute `min` when it receives an empty column from `which`? `> min(c())` Warning message: In min(c()) : no non-missing arguments to min; returning Inf – ak. Apr 20 '16 at 21:53
  • 2
    Also, this dependency `library(dplyr)` isn't really necessary, right? – ak. Apr 20 '16 at 22:05
1

What you want is not clear, give a pseudo code or a formula. It looks like this is what you want... From what I understand from you the last value should be NA

library(data.table)
setDT(ts,key='time')
ts_delayed = ts[,.(time_delayed=time+2)]
setkey(ts_delayed,time_delayed)
ts[ts_delayed,roll=-Inf]
statquant
  • 13,672
  • 21
  • 91
  • 162
  • I didn't read the question, but if that's the answer, I guess data.table isn't needed. By the way, you probably don't want `dt = setDT(df)`, since now `dt` and `df` are the same object. `set*` modifies by reference. – Frank Apr 20 '16 at 20:49
  • 1
    Or: `ts[, value2 := ts[.(time=time+2L), value, roll=-Inf, rollends=TRUE, mult="first", on="time"]]` – Arun Apr 20 '16 at 22:03
  • 3
    Lol just got " Aruned " – statquant Apr 20 '16 at 22:05
0

This should work for your data. If you want to make a general function, you'll have to play around with lazyeval, which honestly might not be worth it.

library(dplyr)
library(zoo)

carry_back = . %>% na.locf(na.rm = TRUE, fromLast = FALSE)


data_frame(time = 
             with(ts, 
                  seq(first(time), 
                      last(time) ) ) ) %>%
  left_join(ts) %>%
  transmute(value_2 = carry_back(value),
            time = time - delay) %>%
  right_join(ts) %>%
  mutate(value_2 = 
           value_2 %>%
           is.na %>%
           ifelse(last(value), value_2) )
bramtayl
  • 4,004
  • 2
  • 11
  • 18
0

collapse::flag supports fast lagging of irregular time series and panels, see also my answer here. To get your exact result, you would have to fill the missing values introduced by flag with a function such as data.table::nafill with option "locf". The combination of these two functions is likely going to be the most parsimonious and efficient solution - compared to what was suggested previously.

Sebastian
  • 1,067
  • 7
  • 12