4

I have a dataframe of 10 columns with house prices, that in some cases, includes NAs. I want to create a new column of weighted sd but for the rows that have a few NAs, I get the following error:

Error in e2[[j]] : subscript out of bounds

What I use per row (and works for rows without NAs):

weighted.sd(my.df[40,2:10], c(9,9,9,9,9,9,9,9,9), na.rm = TRUE)

Example

library(radiant.data)
data("mtcars")
mtcars[mtcars == 0] <- NA
weighted.sd(mtcars[18,1:11], c(11,11,11,11,11,11,11,11,11,11,11), na.rm = TRUE)#works
weighted.sd(mtcars[5,1:11], c(11,11,11,11,11,11,11,11,11,11,11), na.rm = TRUE)#issue here

What is the problem here and how can I create a new column with the weighted SD per row?

Boann
  • 48,794
  • 16
  • 117
  • 146
foo
  • 33
  • 6
  • 1
    Please provide your data.frame `my.df`, output through `dput()`, or an otherwise [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) of the error so we can better troubleshoot. – caldwellst Jan 30 '20 at 13:23

2 Answers2

1

The problem appears to be that weighted.sd() will not operate as you are expecting across rows of a data frame.

Running weighted.sd we can see the code:

weighted.sd <- function (x, wt, na.rm = TRUE) 
{
  if (na.rm) {
    x <- na.omit(x)
    wt <- na.omit(wt)
  }
  wt <- wt/sum(wt)
  wm <- weighted.mean(x, wt)
  sqrt(sum(wt * (x - wm)^2))
}

In your example, you are not feeding in a vector for x, but rather a single row of a data frame. Function na.omit(x) will remove that entire row, due to the NA values - not elements of the vector.

You can try to convert the row to a vector with as.numeric(), but that will fail for this function as well due to how NA is removed from wt.

It seems like something like this is probably what you want. Of course, you have to be careful that you are feeding in valid columns for x.

weighted.sd2 <- function (x, wt, na.rm = TRUE) {

  x <- as.numeric(x)

  if (na.rm) {
    is_na <- is.na(x)

    x <- x[!is_na]
    wt <- wt[!is_na]
  }

  wt <- wt/sum(wt)
  wm <- weighted.mean(x, wt)
  sqrt(sum(wt * (x - wm)^2))
}
weighted.sd2(mtcars[18,1:11], c(11,11,11,11,11,11,11,11,11,11,11), na.rm = TRUE)#works
# [1] 26.76086
weighted.sd2(mtcars[5,1:11], c(11,11,11,11,11,11,11,11,11,11,11), na.rm = TRUE)#issue here
# [1] 116.545

To apply this to all columns, you can use apply().

mtcars$weighted.sd <- apply(mtcars[,1:11], 1, weighted.sd2, wt = rep(11, 11))
                     mpg cyl  disp  hp drat    wt  qsec vs am gear carb weighted.sd
Mazda RX4           21.0   6 160.0 110 3.90 2.620 16.46 NA  1    4    4    52.61200
Mazda RX4 Wag       21.0   6 160.0 110 3.90 2.875 17.02 NA  1    4    4    52.58011
Datsun 710          22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1    37.06108
Hornet 4 Drive      21.4   6 258.0 110 3.08 3.215 19.44  1 NA    3    1    78.36300
Hornet Sportabout   18.7   8 360.0 175 3.15 3.440 17.02 NA NA    3    2   116.54503
...
  • Thank you for the response @Adam! The different weighted.sd seem to work for my data set. However, how can I populate a new column of weighted SD based on both of them? – foo Jan 30 '20 at 14:23
  • Ah sorry I forgot that part. I added this in an edit. –  Jan 30 '20 at 14:44
0

If you do a CTRL+click on weigted.sd function you can see the source code:

function (x, wt, na.rm = TRUE) 
{
  if (na.rm) {
    x <- na.omit(x)
    wt <- na.omit(wt)
  }
  wt <- wt/sum(wt)
  wm <- weighted.mean(x, wt)
  sqrt(sum(wt * (x - wm)^2))
}

When you run it, value vector contain values without NA's and it is reduced. But the weigth vector has the same length as before, resulting in an error.

A solution would be:

weighted.sd(mtcars[5,!is.na(mtcars[5,1:11])], 
c(11,11,11,11,11,11,11,11,11,11,11)[!is.na(mtcars[5,1:11])], na.rm = TRUE)

It's not elegant... But it does the job!

Tibo
  • 68
  • 1
  • 7
  • The function is actually dropping the entire (single) row of the data frame, not reducing the length of the vector. So it is not a mismatch between the length of `x` and `wt` as vectors. But yes, you can remove the `NA`s ahead of time. –  Jan 30 '20 at 14:11