8

I am trying to calculate the mean and standard deviation from certain columns in a data frame, and return those values to new columns in the data frame. I can get this to work for mean:

library(dplyr)
mtcars = mutate(mtcars, mean=(hp+drat+wt)/3)

However, when I try to do the same for standard deviation, I have an issue, because I cannot hardcode the equation like I did for mean very easily. So, I try to use a function, as follows:

mtcars = mutate(mtcars, mean=(hp+drat+wt)/3, stdev = sd(hp,drat,wt))

Resulting in the error "Error in sd(hp, drat, wt) : unused argument (wt)". How can I correct my syntax? Thank you.

  • 4
    In order to calculate the mean you actually wrote the formula but in order to calculate SD you used the built in `sd` function is some strange way. Doesn't it look inconsistent to you? – David Arenburg Apr 11 '15 at 18:45
  • Yes, that is why I stated "when I try to do the same for standard deviation, I have an issue, because I cannot hardcode the equation like I did for mean very easily. So, I try to use a function." I am not sure why you think I used the sd function in some strange way, even though I am sure that is true. The sd function seems to take in a vector of numeric, for instance sd(c(3,5,6)). Even though I am sure it is obvious to you, why is what I am doing not correct? Thanks. –  Apr 11 '15 at 20:29
  • Perhaps what @DavidArenburg is suggesting is that your call to `sd` is incorrect, which it is, in a commonly mistaken way. For instance, try `sd(1,2,3)`, then read `?sd` and see (1) that it describes the first argument as "x: a numeric vector", and (2) it specifically does *not* include "..." (ellipses, that would allow for an arbitrary number of arguments as you are providing). – r2evans Apr 12 '15 at 01:41
  • @user2808302 Using `+` to get mean may not work as expected if there are NA's, In the `mean` and `rowMeans`, there are options for removing NA, ie. `na.rm=TRUE`. – akrun Apr 12 '15 at 05:41

3 Answers3

10

You could try

library(dplyr)
library(matrixStats)
nm1 <- c('hp', 'drat', 'wt')
res1 <- mtcars %>% 
           mutate(Mean= rowMeans(.[nm1]), stdev=rowSds(as.matrix(.[nm1])))

head(res1,3)
#   mpg cyl disp  hp drat    wt  qsec vs am gear carb     Mean    stdev
#1 21.0   6  160 110 3.90 2.620 16.46  0  1    4    4 38.84000 61.62969
#2 21.0   6  160 110 3.90 2.875 17.02  0  1    4    4 38.92500 61.55489
#3 22.8   4  108  93 3.85 2.320 18.61  1  1    4    1 33.05667 51.91809

Or using do

res2 <- mtcars %>% 
             rowwise() %>%
             do(data.frame(., Mean=mean(unlist(.[nm1])),
                         stdev=sd(unlist(.[nm1]))))

head(res2,3)
#   mpg cyl disp  hp drat    wt  qsec vs am gear carb     Mean    stdev
#1 21.0   6  160 110 3.90 2.620 16.46  0  1    4    4 38.84000 61.62969
#2 21.0   6  160 110 3.90 2.875 17.02  0  1    4    4 38.92500 61.55489
#3 22.8   4  108  93 3.85 2.320 18.61  1  1    4    1 33.05667 51.91809
akrun
  • 874,273
  • 37
  • 540
  • 662
  • @arkrun. Thanks, but when I run your first code, I get an error "Error in .[nm1] : object of type 'closure' is not subsettable" –  Apr 11 '15 at 20:55
  • @user2808302 I am not sure about the problem. Are you using recent versions of `dplyr`? I used `dplyr_0.4.1.9000` – akrun Apr 11 '15 at 20:56
  • Thanks @akrun. I just did install.packages("dplyr") and then sessionInfo() showed it was version dplyr_0.4.1 . I reran the code and got the same error! –  Apr 11 '15 at 21:01
  • @user2808302 Can you try by `mtcars %>% mutate(..` as in the update. – akrun Apr 11 '15 at 21:02
  • You're selecting the columns, so you should edit `as.matrix(.[nm1])` to `as.matrix(.[ ,nm1])`. – Ehsan M. Kermani Oct 13 '15 at 20:23
  • @EhsanM.Kermani We selected the columns from a data.frame for which `.[nm1]` gets the columns by default and then only converted to `matrix`. If it was already a matrix, then `.[, nm1]` would be the right way. So, in this case either one works. If you have doubt, please check the result of both cases, would be the same. – akrun Oct 14 '15 at 02:00
  • I get a bunch of Warnings using the `rowwise()` function, but if I use `group_by(row_number())` (or some other explicit rowID) those Warnings go away. – Brian D Apr 02 '20 at 23:29
  • @BrianD it is the deprecated warning ``do()` is deprecated as of dplyr 1.0.0.`. this is an old post. The package gets updated with new functioons and old functions are deprecated – akrun Apr 02 '20 at 23:31
  • ah, I was using dplyr 0.8.3, and R 3.5.3 – Brian D Apr 20 '20 at 18:38
  • that is a bit old – akrun Apr 20 '20 at 18:40
5

You can also write your own vectorised RowSD function as in

RowSD <- function(x) {
  sqrt(rowSums((x - rowMeans(x))^2)/(dim(x)[2] - 1))
}

and then

mtcars %>% 
  mutate(mean = (hp + drat + wt)/3, stdev = RowSD(cbind(hp, drat, wt)))
##     mpg cyl  disp  hp drat    wt  qsec vs am gear carb      mean     stdev
## 1  21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4  38.84000  61.62969
## 2  21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4  38.92500  61.55489
## 3  22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1  33.05667  51.91809
## 4  21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1  38.76500  61.69136
## 5  18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2  60.53000  99.13403
## 6  18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1  37.07333  58.82726
## ...
David Arenburg
  • 91,361
  • 17
  • 137
  • 196
4

Not much change needed, just add rowwise() (thanks @akrun for the comment) and wrap your column names in c(...) (to fix the error):

library(dplyr)
mtcars %>%
    rowwise() %>%
    mutate(mean=(hp+drat+wt)/3, stdev = sd(c(hp,drat,wt)))
## Source: local data frame [32 x 13]
## Groups: <by row>
##     mpg cyl  disp  hp drat    wt  qsec vs am gear carb     mean     stdev
## 1  21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4 38.84000  61.62969
## 2  21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4 38.92500  61.55489
## 3  22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1 33.05667  51.91809
## 4  21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1 38.76500  61.69136
## 5  18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2 60.53000  99.13403
## 6  18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1 37.07333  58.82726
## 7  14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4 83.92667 139.49371
## 8  24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2 22.96000  33.81056
## 9  22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2 34.02333  52.80875
## 10 19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4 43.45333  68.88985
## ..  ... ...   ... ...  ...   ...   ... .. ..  ...  ...      ...       ...
r2evans
  • 141,215
  • 6
  • 77
  • 149
  • 1
    Hi, Using same command giving me identical value for sd. mean is working fine. See the output below – Chirag Feb 02 '18 at 06:41