0

In df i would like to replace the NA values with the previous non-NA value for each id

id<-c(1,1,1,1,2,2,2)
purchase<-c(20,NA,NA,10,NA,NA,5)
df<-data.frame(id,purchase)

 id purchase
  1       20
  1       NA
  1       NA
  1       10
  2       NA
  2       NA
  2        5

The output should ideally look like:

id purchase
  1       20
  1       20
  1       20
  1       10
  2       NA
  2       NA
  2        5

I am aware of Replacing NAs with latest non-NA value, but it does not do it per group. Any help would be appreciated.

AliCivil
  • 2,003
  • 6
  • 28
  • 43
  • 2
    You can achieve it with a combination of `group_by()` from `dplyr` and `fill()` from `tidyr`. – tmfmnk Feb 06 '19 at 20:38

1 Answers1

1

Three ways (so far), all utilizing zoo::na.locf by per group. One thing to note is that you need na.rm=FALSE, otherwise zoo::na.locf may return a shortened vector (as is the case where id is 2).

Base R

do.call("rbind.data.frame",
        by(df, df$id, function(x) transform(x, purchase = zoo::na.locf(purchase, na.rm=FALSE))))
#     id purchase
# 1.1  1       20
# 1.2  1       20
# 1.3  1       20
# 1.4  1       10
# 2.5  2       NA
# 2.6  2       NA
# 2.7  2        5

dplyr

library(dplyr)

df %>%
  group_by(id) %>%
  mutate(purchase = zoo::na.locf(purchase, na.rm = FALSE))

data.table

library(data.table)

DT <- as.data.table(df)
DT[, purchase := zoo::na.locf(purchase, na.rm = FALSE), by = "id" ]
r2evans
  • 141,215
  • 6
  • 77
  • 149