Replace missing values with row means if exactly N missing values per row

Question

I have a data matrix with different number of missing values per rows. What I want is to replace the missing values with row means if the number of missing values per row is N (let's say 1).

I have already created a solution for this problem but it's a very inelegant one so I'm looking for something else.

My solution:

#SAMPLE DATA

a <- c(rep(c(1:4, NA), 2))
b <- c(rep(c(1:3, NA, 5), 2))
c <- c(rep(c(1:3, NA, 5), 2))

df <- as.matrix(cbind(a,b,c), ncol = 3, nrow = 10)

#CALCULATING THE NUMBER OF MISSING VALUES PER ROW

miss_row <- rowSums(apply(as.matrix(df), c(1,2), function(x) {
  sum(is.na(x)) +
  sum(x == "", na.rm=TRUE)
}) )

df <- cbind(df, miss_row)

#CALCULATING THE ROW MEANS FOR ROWS WITH 1 MISSING VALUE

row_mean <- ifelse(df[,4] == 1, rowMeans(df[,1:3], na.rm = TRUE), NA)

df <- cbind(df, row_mean)

Just `df[rowSums(is.na(df))>N] <- rowMeans(df, na.rm=TRUE)[rowSums(is.na(df))>N]` will do. Some remarks on the way: `cbind` creates a `matrix` so no need for `as.matrix` and `is.na` is vectorised, you can directly do `rowSums(is.na(df))` to get number of NA values. — Cath, Jul 12 '18 at 08:10
`df$miss_row <- rowSums(is.na(df));df$row_mean <- NA; df$row_mean[df$miss_row == 1] = rowMeans(df[df$miss_row == 1, 1:3], na.rm = TRUE) ` — Ronak Shah, Jul 12 '18 at 08:11
you say **replace** missing values with row means but your current solution nor the ones in the comment section do that... — s_baldur, Jul 12 '18 at 08:14
@snoram my solution in comment does replace missing values with row mean... — Cath, Jul 12 '18 at 08:14
Sorry, you're right. But isn't *original poster* looking for the case where `N==1`. — s_baldur, Jul 12 '18 at 08:17

score 5 · Accepted Answer · answered Jul 12 '18 at 08:25

Here is the way I mentionned in comment, with more details:

# create your matrix
df <- cbind(a, b, c) # already a matrix, you don't need as.matrix there

# Get number of missing values per row (is.na is vectorised so you can apply it directly on the entire matrix)
nb_NA_row <- rowSums(is.na(df))

# Replace missing values row-wise by the row mean when there is N NA in the row
N <- 1 # the given example
df[nb_NA_row==N] <- rowMeans(df, na.rm=TRUE)[nb_NA_row==N]

# check df

df
#      a  b  c
# [1,] 1  1  1
# [2,] 2  2  2
# [3,] 3  3  3
# [4,] 4 NA NA
# [5,] 5  5  5
# [6,] 1  1  1
# [7,] 2  2  2
# [8,] 3  3  3
# [9,] 4 NA NA
#[10,] 5  5  5

score 1 · Answer 2 · answered Jul 12 '18 at 08:18

df <- data.frame(df)
df$miss_row <- rowSums(is.na(df))
df$row_mean <- NA
df$row_mean[df$miss_row == 1] <- rowMeans(df[df$miss_row == 1,1:3],na.rm = TRUE)
#     a  b  c miss_row row_mean
# 1   1  1  1        0       NA
# 2   2  2  2        0       NA
# 3   3  3  3        0       NA
# 4   4 NA NA        2       NA
# 5  NA  5  5        1        5
# 6   1  1  1        0       NA
# 7   2  2  2        0       NA
# 8   3  3  3        0       NA
# 9   4 NA NA        2       NA
# 10 NA  5  5        1        5

(This gives your expected output, which seems not to be completely in line with your text, but for this see comments and duplicate link)

Replace missing values with row means if exactly N missing values per row

2 Answers2