The OP has requested that the missing values need to be filled in by group. So, the zoo::na.locf()
approach might fail here.
There is a method called update join which can be used to fill in the missing values per group:
library(data.table) # version 1.10.4 used
setDT(DT)
DT[DT[!is.na(V1)][order(V2), .(fillin = first(V2)), by = V1], on = "V1", V2 := fillin][]
# V1 V2
# 1: 1 tree
# 2: 1 tree
# 3: 1 tree
# 4: 2 house
# 5: 2 house
# 6: 2 house
# 7: 3 lawn
# 8: 3 lawn
# 9: 4 NA
#10: 4 NA
#11: NA NA
#12: NA tree
Note that the input data have been supplemented to cover some corner cases.
Explanation
The approach consists of two steps. First, the values to be filled in by group are determined followed by the update join which modifies DT
in place.
fill_by_group <- DT[!is.na(V1)][order(V2), .(fillin = first(V2)), by = V1]
fill_by_group
# V1 fillin
#1: 2 house
#2: 3 lawn
#3: 1 tree
#4: 4 NA
DT[fill_by_group, on = "V1", V2 := fillin][]
order(V2)
ensures that any NA
values are sorted last, so that first(V2)
picks the correct value to fill in.
The update join approach has been benchmarked as the fastest method in another case.
Variant using na.omit()
docendo discimus has suggested in his comment to use na.omit()
. This can be utilized for the update join as well replacing order()
/first()
:
DT[DT[!is.na(V1), .(fillin = na.omit(V2)), by = V1], on = "V1", V2 := fillin][]
Note that na.omit(V2)
works as well as na.omit(V2)[1]
or first(na.omit(V2))
, here.
Data
Edit: The OP has changed his originally posted data set substantially. As a quick fix, I've updated the sample data below to include cases where V1
is NA
.
library(data.table)
DT <- fread(
"1 tree
1 NA
1 tree
2 house
2 house
2 NA
3 NA
3 lawn
4 NA
4 NA
NA NA
NA tree")
Note that the data given by the OP have been supplemented to cover three additional cases:
- The first
V2
value in each group is NA
.
- All
V2
values in a group are NA
.
V1
is `NA.