0

I want to get the mean() and sd() of the different columns in the dataset iris according to the value in the column Species:

> head(iris[order(runif(nrow(iris))), ])
    Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
50           5.0         3.3          1.4         0.2     setosa
111          6.5         3.2          5.1         2.0  virginica
69           6.2         2.2          4.5         1.5 versicolor
150          5.9         3.0          5.1         1.8  virginica

Without distinguishing among the 3 different species, apply would do the trick:

> stats = apply(iris[ ,1:4], MARGIN = 2, function(x) rbind(mean(x), SD = sd(x))); row.names(stats) = c("mean", "sd"); stats
     Sepal.Length Sepal.Width Petal.Length Petal.Width
mean    5.8433333   3.0573333     3.758000   1.1993333
sd      0.8280661   0.4358663     1.765298   0.7622377

But, How can I get a list (?) with these results broken down by species?

Antoni Parellada
  • 4,253
  • 6
  • 49
  • 114
  • @李哲源ZheyuanLi Thank you. Something like this works... `list(means = aggregate(. ~ Species, data = iris, FUN = "mean"), sd = aggregate(. ~ Species, data = iris, FUN = "sd"))`. I wonder if it could be made even more succinct... – Antoni Parellada Jan 21 '17 at 00:11
  • Possible duplicate of [Apply function conditionally](http://stackoverflow.com/questions/16657512/apply-function-conditionally) – Barker Jan 21 '17 at 00:55

4 Answers4

3

aggregate is the function you are looking for:

> aggregate(. ~ Species, data = iris, FUN = mean)
     Species Sepal.Length Sepal.Width Petal.Length Petal.Width
1     setosa        5.006       3.428        1.462       0.246
2 versicolor        5.936       2.770        4.260       1.326
3  virginica        6.588       2.974        5.552       2.026
> aggregate(. ~ Species, data = iris, FUN = sd)
     Species Sepal.Length Sepal.Width Petal.Length Petal.Width
1     setosa    0.3524897   0.3790644    0.1736640   0.1053856
2 versicolor    0.5161711   0.3137983    0.4699110   0.1977527
3  virginica    0.6358796   0.3224966    0.5518947   0.2746501

aggregate computes a function on a data set based on a factor or combination of factors.

Barker
  • 2,074
  • 2
  • 17
  • 31
1

You can split data by species with split function to get list of dataframes

iris2 <- split(iris, iris$Species)
fun <- function(df){
stats = apply(df[ ,1:4], MARGIN = 2, function(x) rbind(mean(x), SD = sd(x)))
row.names(stats) = c("mean", "sd") 
return(stats)
}
lapply(iris2, fun)
ZsideZ
  • 89
  • 1
  • 1
  • 9
1

This is not a complete answer (doesn't return a list and doesn't keep the same table structure). Included for awareness of dplyr very useful summarize_all

library(dplyr)
df <- iris %>% group_by(Species) %>% summarise_all(funs(mean, sd)) 

# A tibble: 3 × 9
# Species Sepal.Length_mean Sepal.Width_mean Petal.Length_mean Petal.Width_mean Sepal.Length_sd Sepal.Width_sd
# <fctr>             <dbl>            <dbl>             <dbl>            <dbl>           <dbl>          <dbl>
# 1     setosa             5.006            3.428             1.462            0.246       0.3524897      0.3790644
# 2 versicolor             5.936            2.770             4.260            1.326       0.5161711      0.3137983
# 3  virginica             6.588            2.974             5.552            2.026       0.6358796      0.3224966
# ... with 2 more variables: Petal.Length_sd <dbl>, Petal.Width_sd <dbl>
Andrew Lavers
  • 4,328
  • 1
  • 12
  • 19
0

Another option is data.table

library(data.table)
as.data.table(iris)[,unlist(lapply(.SD, function(x)
    list(Mean = mean(x), SD = sd(x))), recursive = FALSE), Species]
#     Species Sepal.Length.Mean Sepal.Length.SD Sepal.Width.Mean Sepal.Width.SD Petal.Length.Mean Petal.Length.SD Petal.Width.Mean
#1:     setosa             5.006       0.3524897            3.428      0.3790644             1.462       0.1736640            0.246
#2: versicolor             5.936       0.5161711            2.770      0.3137983             4.260       0.4699110            1.326
#3:  virginica             6.588       0.6358796            2.974      0.3224966             5.552       0.5518947            2.026
#   Petal.Width.SD
#1:      0.1053856
#2:      0.1977527
#3:      0.2746501
akrun
  • 874,273
  • 37
  • 540
  • 662