1

I have a data matrix with different number of missing values per rows. What I want is to replace the missing values with row means if the number of missing values per row is N (let's say 1).

I have already created a solution for this problem but it's a very inelegant one so I'm looking for something else.

My solution:

#SAMPLE DATA

a <- c(rep(c(1:4, NA), 2))
b <- c(rep(c(1:3, NA, 5), 2))
c <- c(rep(c(1:3, NA, 5), 2))

df <- as.matrix(cbind(a,b,c), ncol = 3, nrow = 10)

#CALCULATING THE NUMBER OF MISSING VALUES PER ROW

miss_row <- rowSums(apply(as.matrix(df), c(1,2), function(x) {
  sum(is.na(x)) +
  sum(x == "", na.rm=TRUE)
}) )

df <- cbind(df, miss_row)

#CALCULATING THE ROW MEANS FOR ROWS WITH 1 MISSING VALUE

row_mean <- ifelse(df[,4] == 1, rowMeans(df[,1:3], na.rm = TRUE), NA)

df <- cbind(df, row_mean)
tmfmnk
  • 38,881
  • 4
  • 47
  • 67
  • 3
    Just `df[rowSums(is.na(df))>N] <- rowMeans(df, na.rm=TRUE)[rowSums(is.na(df))>N]` will do. Some remarks on the way: `cbind` creates a `matrix` so no need for `as.matrix` and `is.na` is vectorised, you can directly do `rowSums(is.na(df))` to get number of NA values. – Cath Jul 12 '18 at 08:10
  • `df$miss_row <- rowSums(is.na(df));df$row_mean <- NA; df$row_mean[df$miss_row == 1] = rowMeans(df[df$miss_row == 1, 1:3], na.rm = TRUE) ` – Ronak Shah Jul 12 '18 at 08:11
  • you say **replace** missing values with row means but your current solution nor the ones in the comment section do that... – s_baldur Jul 12 '18 at 08:14
  • @snoram my solution in comment does replace missing values with row mean... – Cath Jul 12 '18 at 08:14
  • 1
    Sorry, you're right. But isn't *original poster* looking for the case where `N==1`. – s_baldur Jul 12 '18 at 08:17
  • 2
    @snoram so `==N` instead of `>N`... – Cath Jul 12 '18 at 08:19
  • It's slightly confusing you call your matrix `df`. – Axeman Jul 12 '18 at 09:02

2 Answers2

5

Here is the way I mentionned in comment, with more details:

# create your matrix
df <- cbind(a, b, c) # already a matrix, you don't need as.matrix there

# Get number of missing values per row (is.na is vectorised so you can apply it directly on the entire matrix)
nb_NA_row <- rowSums(is.na(df))

# Replace missing values row-wise by the row mean when there is N NA in the row
N <- 1 # the given example
df[nb_NA_row==N] <- rowMeans(df, na.rm=TRUE)[nb_NA_row==N]

# check df

df
#      a  b  c
# [1,] 1  1  1
# [2,] 2  2  2
# [3,] 3  3  3
# [4,] 4 NA NA
# [5,] 5  5  5
# [6,] 1  1  1
# [7,] 2  2  2
# [8,] 3  3  3
# [9,] 4 NA NA
#[10,] 5  5  5
Cath
  • 23,906
  • 5
  • 52
  • 86
1
df <- data.frame(df)
df$miss_row <- rowSums(is.na(df))
df$row_mean <- NA
df$row_mean[df$miss_row == 1] <- rowMeans(df[df$miss_row == 1,1:3],na.rm = TRUE)
#     a  b  c miss_row row_mean
# 1   1  1  1        0       NA
# 2   2  2  2        0       NA
# 3   3  3  3        0       NA
# 4   4 NA NA        2       NA
# 5  NA  5  5        1        5
# 6   1  1  1        0       NA
# 7   2  2  2        0       NA
# 8   3  3  3        0       NA
# 9   4 NA NA        2       NA
# 10 NA  5  5        1        5

(This gives your expected output, which seems not to be completely in line with your text, but for this see comments and duplicate link)

moodymudskipper
  • 46,417
  • 11
  • 121
  • 167