I have a large dataset with some missing values (NAs). I'm looking to replace these values with the column means but by class, that is, where items in class k have a missing value in column j, that value will be replaced by the mean of values in column J for items in class k. Additionally, I want to do this with only base R or dplyr.
The class aspect brings an additional problem compared to the well-known one that's already been answered here: Replace missing values with column mean.
In fact I can adapt one of the solutions there into a clumsy solution for my problem:
NA2mean <- function(x){replace(x, is.na(x), mean(x, na.rm = TRUE))}
DF %>% filter(DF$class=="A") -> A
A <- lapply(A,NA2mean)
(where the dataframe is DF and I have assumed the factor is stored in the column 'class'.)
Then you'd repeat this for every other class (e.g. B, C, D, E, F). Finally you could use DF <- rbind(A,B,C,D,E,F) to replace your old dataframe with the corrected one.
The dataframe in my case is ordered by class (i.e. A first, then B, then C, ...) and I'd like to keep it that way.
Any way of doing this much more efficiently?