1

I am trying to find the average of the date of the row above and of the row below. However, lag(Date) and lead(Date) are producing vectors which mean() then refuses to work with returning NA instead (figuring that out took me faaaaaar too long).

df <- data_frame(Date = as.Date(c("2020-01-01", NA, "2020-12-12")))
df <- mutate(df, Date = replace_na(Date, mean(c(lag(Date),lead(Date)))))

This gives ("2020-01-01", NA, "2020-12-12") whereas what I want is ("2020-01-01, "2020-06-06", "2020-12-12")

So how do I access the previous and next row value for Date so that I can generate an average?

vorpal
  • 268
  • 4
  • 17
  • I reccomend that you use a slide function instead, check out the tsibble package – Bruno Apr 27 '20 at 12:49
  • try `mean(......na.rm=TRUE)` – Frank Zhang Apr 27 '20 at 12:58
  • So the na.rm=TRUE works (for my minimal example, but not my real problem), but I don't understand why. Presumably lag and lead still produce vectors or am I just completely misunderstanding what is going on. – vorpal Apr 27 '20 at 23:04

2 Answers2

2

data_frame is deprecated so we have replaced that with data.frame. tibble would be an alternative but involves an additional dependency. Use na.approx in zoo and convert it back to Date class as it produces a numeric output. This also works to interpolate multiple NAs in a row. That would not work if we used lead and lag. If there are NAs at the ends this leaves them as is or we could use different arguments to na.approx to fill those in too; however, in the example data there are no such NAs so we leave it as shown.

(Note that the correct halfway point is as shown below rather than as shown in the question. There are 173 days between 2020-01-01 and 2020-06-22 and also between 2020-06-22 and 2020-12-12.)

library(dplyr)
library(zoo)

df <- data.frame(Date = as.Date(c("2020-01-01", NA, "2020-12-12"))) # test input

mutate(df, Date = as.Date(na.approx(Date, na.rm = FALSE)))

giving:

        Date
1 2020-01-01
2 2020-06-22
3 2020-12-12
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
  • I have no idea why, but while the other answers worked for my minimal example, they didn't work for my real problem. However this one using zoo did, for reasons I can't get my head around. – vorpal Apr 27 '20 at 23:02
1

Base R one liner solving your sample data (suspecting you have multiple instances of NA -- see solution below):

df$Date <- ifelse(is.na(df$Date), mean(df$Date, na.rm = TRUE), df$Date)

Interpolating dates:

df$Date <- as.Date(ifelse(
  is.na(df$Date),
  approx(as.numeric(df$Date), method = "linear", n = nrow(df))$y[which(is.na(df$Date))]
  ,
  df$Date
),
origin = as.Date("1970-01-01", "%Y-%m-%d"),
"%d-%m-%Y")

Data used:

df <- data.frame(Date = as.Date(c("2020-01-01", NA, "2020-12-12")))
hello_friend
  • 5,682
  • 1
  • 11
  • 15
  • Note: the interpolation won't work properly if you have NAs at the beginning or end of your date vector. For a solution involving interpolating/extrapolating those values please see my answer here: https://stackoverflow.com/questions/7188807/interpolate-na-values/60143410#60143410 – hello_friend Apr 27 '20 at 13:24