25

Maybe this is simple but I can't find answer on web. I have problem with mean calculation by factors by level. My data looks typicaly:

factor, value
a,1
a,2
b,1
b,1
b,1
c,1

I want to get vector A contains mean only for level "a" If I type A on consol I want to get 1.5 And this method for calculating mean, must use factors.

Thank you in advance for help.

Bartek Taciak
  • 295
  • 1
  • 3
  • 6
  • 18
    Try `aggregate(value~factor, FUN=mean)` – Thomas Apr 30 '14 at 18:31
  • 6
    Or `A <- mean(data$value[data$factor == "a"])` – lukeA Apr 30 '14 at 18:32
  • 1
    @Bartek. If you're going to go through the work of traversing the data frame to find which elements are factor=="a" you might as well perform the operation on the whole dataframe and take advantage of the other means later if needed. – JPC Apr 30 '14 at 18:53

5 Answers5

34

take a look at tapply, which lets you break up a vector according to a factor(s) and apply a function to each subset

> dat<-data.frame(factor=sample(c("a","b","c"), 10, T), value=rnorm(10))
> r1<-with(dat, tapply(value, factor, mean))
> r1
         a          b          c
 0.3877001 -0.4079463 -1.0837449
> r1[["a"]]
[1] 0.3877001

You can access your results using r1[["a"]] etc.

Alternatively, one of the popular R packages (plyr) has very nice ways of doing this.

> library(plyr)
> r2<-ddply(dat, .(factor), summarize, mean=mean(value))
> r2
  factor       mean
1      a  0.3877001
2      b -0.4079463
3      c -1.0837449
> subset(r2,factor=="a",select="mean")
       mean
1 0.3877001

You can also use dlply instead (which takes a dataframe and returns a list instead)

> dlply(dat, .(factor), summarize, mean=mean(value))$a
       mean
1 0.3877001
JPC
  • 1,891
  • 13
  • 29
  • 1
    Is it possible to use ddply with two factors? – Ben Jan 15 '20 at 09:30
  • 1
    @Ben indeed, you can just modify the `ddply` call to `ddply(dat, .(factor, factor2), summarize, mean=mean(value))`, and this generalizes to more columns you want to "group" by. Hope that helps – JPC Jan 15 '20 at 17:44
7

The following code asks for the mean of value when factor = a:

mean(data$value[data$factor == "a"])
Thomas
  • 43,637
  • 12
  • 109
  • 140
Lenatis
  • 71
  • 1
7

Another simple possibilty would be the "by" function:

by(value, factor, mean)

You can get the mean of factor level "a" by:

factor_means <- by(value, factor, mean)
factor_means[attr(factor_means, "dimnames")$factor=="a"]
Ruediger Ziege
  • 320
  • 3
  • 18
6

Just for fun posting the data.table solution although you probably should do what @lukeA suggested

library(data.table) 
A <- setDT(df)[factor == "a", mean(value)]
## [1] 1.5
David Arenburg
  • 91,361
  • 17
  • 137
  • 196
  • 6
    What a truly bizarre programming language R is. – duhaime Nov 06 '18 at 01:34
  • 2
    @duhaime This is very silly way to do something very simple. I've posted this back when I just joined and was very rep hungry. If I could, I would delete this is answer all together. BTW, the solutions in the comments also look bizarre to you? Can you find something less bizzare than `aggregate(value~factor, FUN=mean)` in Python (not to mention Pandas copied everything from R). – David Arenburg Nov 06 '18 at 06:15
  • 1
    amen. Python doesn't have anything quite so cute as the aggregate function (which is pretty legible), but on the whole I find Python to be more expressive and easier to read. I find R is generally full of extremely terse statements, which while more compact than Python's syntax, are less easy to read off the page (at least for non-diehards). Reading a function in Python, one immediately sees how to translate it into any number of languages, but not so for R. That said, maybe I just need to drink the koolaid... – duhaime Nov 06 '18 at 13:05
  • 1
    @duhaime have you heard of the dplyr (or tidyverse) package in R? There is nothing more expressive than that in any language I believe. regarding Python, There are so many confusing stuff there like all these list comprehension shortcuts, numpy has the `np.reshape(-1,...` trick. You can exhaust the gorpuby in an iterator and so on. But I guess, this debate won't lead anywhere :) – David Arenburg Nov 06 '18 at 14:11
0

You can use ddply and pass summary as the function.

library(plyr) # import library
ddply(nameOfTheDataframe, ~ factor, function(data) summary(data$value))
noone
  • 6,168
  • 2
  • 42
  • 51