0

How can we fill/forward pad a time series in R but only if the last value is within the last 12 months/observations otherwise NA?

Sample data: Variable is the original and desired is the desired outcome. We observe NAs from June 2016 but we will forward pad. I only want to do this for 12 months so as soon as we reach July 2017 the last non-NA is too long a go and should be NA. That's why something like fill() alone will not do

example

minimum working example: Consider the below using max gap and na.locf

x = c(NA,1,2,3,NA,NA,5,6,7,NA,NA,NA)
zoo::na.locf(x,  maxgap = 2, na.rm = FALSE)

Instead of not filling anything when number of NAs> max gap I would like the below output: NA,1,2,3,3,3,5,6,7,7,7,NA. So if I specify gap =2 I would want at most two values filled and any more NAs should stay NA

Kathi
  • 99
  • 1
  • 5
  • What does _fill/forward pad a time series_ means? Can you add an example of your desired outcome so we can help better? – onlyphantom May 25 '18 at 04:28
  • he probably have missing dates in time series data and want to insert dates to create uniform time steps. But last 12 months from which date? – Mankind_008 May 25 '18 at 04:40
  • The data already has uniform time steps. I want to fill the NAs, but if the last available value is longer than 12 months ago set NA instead – Kathi May 25 '18 at 04:41
  • @Kathi Please add some sample data so that people can help you easily. You should expect others to spent time on creating sample data for you. Please note `image` doesnt help as it will require someone to typing. – MKR May 25 '18 at 05:44

3 Answers3

0

You could do it like this:

my_fill <- function(x,maxgap=2){
  y<-x
  counter<-0
  for(i in 2:length(y)){
    if(is.na(y[i] & counter<maxgap)){
      y[i]<-y[i-1]
      counter<-counter+1
    }else{
      counter<-0
    }
  }
  return(y)
}

x = c(NA,1,2,3,NA,NA,5,6,7,NA,NA,NA)
my_fill(x)
otwtm
  • 1,779
  • 1
  • 16
  • 27
0

An option is to use tidyr::fill. The approach is to create columns as desired and TempDate in such a way that desired will have same value as variable but rows with "" (blank) value for variable will have desired as NA. Similarly TempDate will have same value as date but it will have NA for rows where variable got "" values.

fill both desired and TempDate and replace desired to NA where TempDate is older by more than 12 months than date.

library(tidyverse)
library(lubridate)

df %>% mutate(TempDate = as.Date(ifelse(variable=="", NA, date),origin = "1970-01-01"),
              desired = ifelse(variable=="",NA, variable)) %>%
  fill(desired, TempDate) %>%
  mutate(desired = ifelse(date > (TempDate +months(12)), NA, desired)) %>%
  select(-TempDate)

#          date variable desired
# 1  2016-01-01        1       1
# 2  2016-02-01        2       2
# 3  2016-03-01        3       3
# 4  2016-04-01        3       3
# 5  2016-05-01        3       3
# 6  2016-06-01       33      33
# 7  2016-07-01               33
# 8  2016-08-01               33
# 9  2016-09-01               33
# 10 2016-10-01               33
# 11 2016-11-01               33
# 12 2016-12-01               33
# 13 2017-01-01               33
# 14 2017-02-01               33
# 15 2017-03-01               33
# 16 2017-04-01               33
# 17 2017-05-01               33
# 18 2017-06-01               33
# 19 2017-07-01             <NA>
# 20 2017-08-01             <NA>
# 21 2017-09-01       34      34
# 22 2017-10-01               34

Data: Based on image shared by OP

df <- data.frame(date = seq(as.Date("2016-01-01"), as.Date("2017-10-01"), by="month"),
           variable = c(1,2,3,3,3,33,rep("",14),34,""), stringsAsFactors = FALSE)

df
#          date variable
# 1  2016-01-01        1
# 2  2016-02-01        2
# 3  2016-03-01        3
# 4  2016-04-01        3
# 5  2016-05-01        3
# 6  2016-06-01       33
# 7  2016-07-01         
# 8  2016-08-01         
# 9  2016-09-01         
# 10 2016-10-01         
# 11 2016-11-01         
# 12 2016-12-01         
# 13 2017-01-01         
# 14 2017-02-01         
# 15 2017-03-01         
# 16 2017-04-01         
# 17 2017-05-01         
# 18 2017-06-01         
# 19 2017-07-01         
# 20 2017-08-01         
# 21 2017-09-01       34
# 22 2017-10-01         
MKR
  • 19,739
  • 4
  • 23
  • 33
0

Here is a one-liner using base R. k specifies the gap size. x is defined in the minimum working example in the question.

cumsum(!is.na(x)) produces a grouping vector in which each non-NA followed by any NAs are regarded as one group. Then for each such group replace the first k+1 elements (or the number of elements in the group if less) with the first element of the group leaving the remaining elements as NA.

k <- 2
ave(x, cumsum(!is.na(x)), FUN = function(x) replace(x, 1:min(length(x), k+1), x[1]))
## [1] NA  1  2  3  3  3  5  6  7  7  7 NA

Note that this idea is also used in this SO answer Change maxgap for number of times a value is carried forward involving data.table.

G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341