2

Let's say I have the following dataframe:

df <- as.data.frame(rbind(c(5, NA, NA, 1, NA, NA),
                         c(NA, 2, 2, NA, 0.5, 0.5), 
                         c(NA, NA, NA, NA, NA, NA),
                         c(1, 1, 1, 0.33, 0.33, 0.33)))
colnames(df) <- c("V1", "V2", "V3", "W1", "W2", "W3")

I would like to add a weighted mean to the dataframe, discarding the NA when they exist. For example, in the first line, we would only take V1 and W1 to calculate the weighted mean.

My final dataframe would look like this:

   V1 V2  V3  W1   W2   W3   Wmean
1   5 NA  NA   1   NA   NA       5
2  NA  2   2  NA  0.5  0.5.      2
3  NA NA  NA  NA   NA   NA      NA
4   1  1   1  .33  .33  .33      1

Note that the weighted mean is calculated as such: Wmean = (V1*W1 + V2*W2 + V3*W3)/(W1 + W2 + W3)

Liky
  • 1,105
  • 2
  • 11
  • 32

3 Answers3

2

Using:

df$Wmean <- rowSums(df[1:3] * df[4:6], na.rm = TRUE) / rowSums(df[4:6], na.rm = TRUE)

gives:

> df
  V1 V2 V3   W1   W2   W3 Wmean
1  5 NA NA 1.00   NA   NA     5
2 NA  2  2   NA 0.50 0.50     2
3 NA NA NA   NA   NA   NA   NaN
4  1  1  1 0.33 0.33 0.33     1

If you don't know the positions of the V and W columns beforehand, you can do (like @Ronak does with startsWith):

Vi <- grep("^V", names(df))
Wi <- grep("^W", names(df))

df$Wmean <- rowSums(df[Vi] * df[Wi], na.rm = TRUE) / rowSums(df[Wi], na.rm = TRUE)
Jaap
  • 81,064
  • 34
  • 182
  • 193
2

We can gather indices of columns with "V" and "W" separately and then apply, weighted.mean for each row ignoring the NA values.

V_index <- startsWith(names(df), "V")
W_index <- startsWith(names(df), "W")
df$WMean <- apply(df, 1, function(x) 
                  weighted.mean(x[V_index], x[W_index], na.rm = TRUE))

df

#  V1 V2 V3   W1   W2   W3 WMean
#1  5 NA NA 1.00   NA   NA     5
#2 NA  2  2   NA 0.50 0.50     2
#3 NA NA NA   NA   NA   NA   NaN
#4  1  1  1 0.33 0.33 0.33     1
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
2

You can also replace NA with 0s and do it this way:

df[is.na(df)] <- 0
df$mean = ((df$V1 * df$W1) + (df$V2 * df$W2) + (df$V3 * df$W3))/(df$W1 + df$W2 + df$W3)

Output:

  V1 V2 V3   W1   W2   W3 mean
1  5  0  0 1.00 0.00 0.00    5
2  0  2  2 0.00 0.50 0.50    2
3  0  0  0 0.00 0.00 0.00  NaN
4  1  1  1 0.33 0.33 0.33    1
Ankur Sinha
  • 6,473
  • 7
  • 42
  • 73