1

I have a time series data frame in R that has a column, V1, which consists of integers with a few NAs interspersed throughout. I want to iterate over this column and subtract V1 from itself one time step previously. However, I want to ignore the NA values in V1 and use the last non-NA value in the subtraction. If the current value of V1 is NA, then the difference should return NA. See below for an example

V1 <- c(1, 3, 4, NA, NA, 6, 9, NA, 10)
time <- 1:length(V1)
dat <- data.frame(time = time,
                     V1 = V1)
lag_diff <- c(NA, 2, 1, NA, NA, 2, 3, NA, 1) # The result I want
diff(dat$V1) # Not the result I want

I'd prefer not to do this with loops because I have hundreds of data frames, each with >10,000 rows.

My first thought to solve this was to filter out the NA rows, perform the iterative difference calculation and then reinsert the rows that were filtered out but I can't think of a way to do that. It doesn't seem very "tidy" to do it that way either and I'm not sure it would be faster than looping. Any help is appreciated, bonus points if the solution uses tidyverse functions.

Jake
  • 196
  • 1
  • 18

2 Answers2

3
dat[!is.na(dat$V1), 'lag_diff'] <- c(NA, diff(dat[!is.na(dat$V1), 'V1']))
#   time V1 lag_diff
# 1    1  1       NA
# 2    2  3        2
# 3    3  4        1
# 4    4 NA       NA
# 5    5 NA       NA
# 6    6  6        2
# 7    7  9        3
# 8    8 NA       NA
# 9    9 10        1

Or with data.table (same result)

library(data.table)
setDT(dat)

dat[!is.na(V1), lag_diff := V1 - shift(V1)]

#    time V1 lag_diff
# 1:    1  1       NA
# 2:    2  3        2
# 3:    3  4        1
# 4:    4 NA       NA
# 5:    5 NA       NA
# 6:    6  6        2
# 7:    7  9        3
# 8:    8 NA       NA
# 9:    9 10        1
IceCreamToucan
  • 28,083
  • 2
  • 22
  • 38
  • Wow. I can't believe it was so simple. I certainly don't need any tidyverse functions since you were able to do it with the base indexing. – Jake Aug 28 '18 at 16:52
1

A tidyverse version, just in case. It does need a filter though

dat %>% 
  filter(!is.na(V1)) %>% 
  mutate(diff=V1- lag(V1)) %>% 
  right_join(dat,by=c("time","V1"))
ashleych
  • 1,042
  • 8
  • 25