0

I want to fill NA rows in a data table by the mean values for unique value from another column. Please see the intended output. How can I achieve this in R? I prefer the data table output.

data2 <- data.table(Plan=c(11,11,11,11,91,91,91,91), Price=c(4.4,4.4,4.4,NA,3.22,3.22,3.22,NA), factor=c(0.17,0.17,0.17,NA,0.15,0.15,0.15,NA), Type=c(4,4,4,4,3,3,3,3))

data2
   Plan Price factor Type
1:   11  4.40   0.17    4
2:   11  4.40   0.17    4
3:   11  4.40   0.17    4
4:   11    NA     NA    4
5:   91  3.22   0.15    3
6:   91  3.22   0.15    3
7:   91  3.22   0.15    3
8:   91    NA     NA    3



Output
       Plan Price factor Type
    1:   11  4.40   0.17    4
    2:   11  4.40   0.17    4
    3:   11  4.40   0.17    4
    4:   11  4.40   0.17    4
    5:   91  3.22   0.15    3
    6:   91  3.22   0.15    3
    7:   91  3.22   0.15    3
    8:   91  3.22   0.15    3
NUdu
  • 173
  • 6
  • https://stackoverflow.com/questions/34124291/data-table-replace-na-with-mean-for-multiple-columns-and-by-id – M-- Dec 11 '19 at 16:21
  • I'm not sure how much this matters, but in your example each group has all the same values for each column, so the fact that you want a mean won't actually be evident – camille Dec 11 '19 at 16:32
  • [Possible duplicate](https://stackoverflow.com/a/58593144/2204410) – Jaap Dec 11 '19 at 17:32

1 Answers1

0

We can use na.locf grouped by 'Plan' to change the NA with non-NA preceding values

library(zoo)
data2[, factor := na.locf(factor), by = Plan]

If we need mean, use na.aggregate

data2[, factor := na.aggregate(factor), by = Plan]

For multiple columns

nm1 <- c("Price", "factor"_
data2[, (nm1) := lapply(.SD, na.aggregate), by = Plan, .SDcols = nm1]
akrun
  • 874,273
  • 37
  • 540
  • 662