Replace missing value with mean of class within column

Question

I have a large dataset with some missing values (NAs). I'm looking to replace these values with the column means but by class, that is, where items in class k have a missing value in column j, that value will be replaced by the mean of values in column J for items in class k. Additionally, I want to do this with only base R or dplyr.

The class aspect brings an additional problem compared to the well-known one that's already been answered here: Replace missing values with column mean.

In fact I can adapt one of the solutions there into a clumsy solution for my problem:

NA2mean <- function(x){replace(x, is.na(x), mean(x, na.rm = TRUE))}
DF %>% filter(DF$class=="A") -> A
A <- lapply(A,NA2mean)

(where the dataframe is DF and I have assumed the factor is stored in the column 'class'.)

Then you'd repeat this for every other class (e.g. B, C, D, E, F). Finally you could use DF <- rbind(A,B,C,D,E,F) to replace your old dataframe with the corrected one.

The dataframe in my case is ordered by class (i.e. A first, then B, then C, ...) and I'd like to keep it that way.

Any way of doing this much more efficiently?

score 1 · Answer 1 · answered May 01 '20 at 01:03

1

Base R Solution:

df[, sapply(df, is.numeric)] <-
  do.call("rbind", lapply(split(df[, sapply(df, is.numeric)], df$class), function(x) {
    x <- ifelse(is.na(x), mean(x, na.rm = TRUE), x)
  }))

answered May 01 '20 at 01:03

hello_friend

5,682
1
11
15

Thank you! Nice solution – Mobeus Zoom May 01 '20 at 22:54

akrun · Answer 2 · 2020-05-01T22:57:21.693

1

We can use na.aggregate from zoo

library(dplyr)
library(zoo)
DF %>%
  group_by(class) %>%
  mutate_at(vars(-group_cols()), na.aggregate)

If we need base R

nm1 <- setdiff(names(DF), "class")
DF[nm1] <- lapply(DF[nm1], function(vec) ave(vec, class, FUN = NA2mean))

edited May 01 '20 at 22:57

answered May 01 '20 at 19:30

akrun

874,273
37
540
662

Thanks for replying, but I asked for base R and dplyr only. – Mobeus Zoom May 01 '20 at 22:54
@MobeusZoom `na.aggregate` automatically changes the NA by replacing with mean – akrun May 01 '20 at 22:55

score 0 · Accepted Answer · answered May 01 '20 at 00:57

0

Using dplyr, you could group_by Class and apply NA2mean for every column.

library(dplyr)
DF %>% group_by(class) %>% mutate_all(NA2mean)

In the newer version of dplyr, you can do this across

DF %>% group_by(class) %>% mutate(across(everything(), NA2mean))

answered May 01 '20 at 00:57

Ronak Shah

377,200
20
156
213

Thanks! The simplest good solution, and you gave two of them. – Mobeus Zoom May 01 '20 at 22:54

Replace missing value with mean of class within column

3 Answers3