-1

I have a column in data frame that I created in R. After a certain month, the values become NA. I would like to replace the NAs with the record 12 months back. Is there a function in R for me to do this? Or do I have to do a loop?

So Jan-11 would then become 10, Feb-11 would become 11 and so forth.

EDIT:

I also tried:

for (i in 1:length(df$var)) { 
df$var[i] <- ifelse(is.na(df$var[i]), df$var[i - 12], 
df$var[i]) }

but the whole column ends up being NA.

enter image description here

ali.hash
  • 106
  • 1
  • 1
  • 11
  • 1
    `library(dplyr); df %>% mutate(value2 = if_else(is.na(value), lag(value, 12), value)` – Jon Spring Nov 08 '18 at 18:47
  • 1
    Possible duplicate of [Replacing NAs with latest non-NA value](https://stackoverflow.com/questions/7735647/replacing-nas-with-latest-non-na-value) – Jon Spring Nov 08 '18 at 20:21
  • Hi John. Thanks for your help, but this does not work.Jan-11 will show the value 10, but when it comes to Jan-12, it shows NA (when it should be 10). Are there options that I am supposed to add to the lag function to do this? Thanks – ali.hash Nov 09 '18 at 01:24

2 Answers2

1

Aha, from the last comment it sounds like you'd like a "chained" lag, where it uses the last value of that month that is available, however many years back you need to go.

Jan-11 will show the value 10, but when it comes to Jan-12, it shows NA (when it should be 10).

Here's an approach that relies on first grouping by month, and then using tidyr::fill() to fill in from the last valid value for that month.

First, some fake data. (BTW it would be useful to include something like this in your question so that answerers don't have to retype your numbers or generate new ones.)

# Make fake data with 1 year values, 2 yrs NAs
library(lubridate)
set.seed(42);
data <- data.frame(
  dates = seq.Date(from = ymd(20100101), to = ymd(20121201), by = "month"),
  values = c(as.integer(rnorm(12, 10, 3)), rep(NA_integer_, 24))
)

# Group by months, fill within groups, ungroup.
library(tidyverse)
data_filled <- data %>%
  group_by(month = month(dates)) %>%
  fill(values) %>%
  ungroup() %>%
  arrange(dates)
Jon Spring
  • 55,165
  • 4
  • 35
  • 53
0

I can't think of a way to do this without a loop, but this should give you what you need:

df <- data.frame(col1 = LETTERS[1:24],
             col2 = c(rnorm(12), rep(NA, 12)))
for(i in 1:nrow(df)) {
    if(is.na(df[i, 2])) {
    df[i, 2] <- df[i - 12, 2]
  }
}
Cleland
  • 349
  • 1
  • 6