1

I want to find the mean and standard deviation of the values present in a data frame,

> print(Data)
  Description   X24386
0 A             65.8751
1 A             152.6380
2 A             90.0535
3 A             NA
4 B             74.4218
5 B             77.7234
6 B             79.7033
7 B             72.5045
8 C             79.3704
9 C             81.9795

Using the dcast aggregate I could group the data and find the mean

result<-dcast(D, Description ~ . ,fun.aggregate=mean)   

> print(result)
  Description        .
1 B                  76.08
2 C                  80.67
3 A                   NA

Since a non-numeric value is present in one of the rows of group A, the mean is NA.

I'm looking for a syntax using which I can compute the mean of the columns that contain numeric values. I looked at the answers given in posts here

Since I want to find the mean for a specific column I am facing trouble in implementing it for my case.

Also, I would like to ask for suggestions on how

result<-dcast(D, Description ~ . ,fun.aggregate=mean)

can be modified to print the standard deviations in the column next to the mean.

Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
Natasha
  • 1,111
  • 5
  • 28
  • 66

1 Answers1

3

You can add the na.rm = TRUE argument for mean (see ?mean) to the end of the dcast:

dcast(D, Description ~ . ,fun.aggregate=mean, na.rm = TRUE)

  Description         .
1           A 102.85553
2           B  76.08825
3           C  80.67495

If you want standard deviation too, I'd switch to using dplyr so you can group the data then summarise as many times as you like.

library(dplyr)
D %>% 
  group_by(Description) %>% 
  summarise(Mean = mean(X24386, na.rm = TRUE), 
            SD   = sd(X24386, na.rm = TRUE))

# A tibble: 3 x 3
  Description  Mean    SD
  <chr>       <dbl> <dbl>
1 A           103.  44.8 
2 B            76.1  3.23
3 C            80.7  1.84

You could use aggregate too:

aggregate(. ~ Description, data = D, FUN = function(x) c(mn = mean(x), sd = sd(x)))

  Description  X24386.mn  X24386.sd
1           A 102.855533  44.775766
2           B  76.088250   3.233297
3           C  80.674950   1.844912
neilfws
  • 32,751
  • 5
  • 50
  • 63
  • 1
    note that if `agg` is the result of your `aggregate`, you cannot do `agg[, "X24386.mn"]`, it will throw an error. This is because `agg` only has *two* columns, the second is a `matrix` with columns `mn` and `sd`. See `str(agg)` and the accepted answer to [this question](https://stackoverflow.com/questions/12064202/apply-several-summary-functions-on-several-variables-by-group-in-one-call). – Rui Barradas Aug 22 '18 at 05:42
  • Is there any command that can give the output in the following format, `Description X24386.mn ± X24386.sd` `1 A 102.855533 ± 44.775766` `2 B 76.088250 ± 3.233297` `3 C 80.674950 ± 1.844912` In short, I wish to output the mean and standard deviation in a single column so that it will be easy to export the data to excel. – Natasha Aug 22 '18 at 10:16
  • I found some answers [here](https://stackoverflow.com/questions/5559467/how-to-merge-two-columns-in-r-with-a-specific-symbol) that show how to merge two columns with a symbol in between. Since the second element is a matrix I'm not able to apply those steps. Any suggestions? – Natasha Aug 22 '18 at 11:16