Mean by factor by level

Question

Maybe this is simple but I can't find answer on web. I have problem with mean calculation by factors by level. My data looks typicaly:

factor, value
a,1
a,2
b,1
b,1
b,1
c,1

I want to get vector A contains mean only for level "a" If I type A on consol I want to get 1.5 And this method for calculating mean, must use factors.

Thank you in advance for help.

@Bartek. If you're going to go through the work of traversing the data frame to find which elements are factor=="a" you might as well perform the operation on the whole dataframe and take advantage of the other means later if needed. — JPC, Apr 30 '14 at 18:53

JPC · Answer 1 · 2014-04-30T18:58:41.783

take a look at tapply, which lets you break up a vector according to a factor(s) and apply a function to each subset

> dat<-data.frame(factor=sample(c("a","b","c"), 10, T), value=rnorm(10))
> r1<-with(dat, tapply(value, factor, mean))
> r1
         a          b          c
 0.3877001 -0.4079463 -1.0837449
> r1[["a"]]
[1] 0.3877001

You can access your results using r1[["a"]] etc.

Alternatively, one of the popular R packages (plyr) has very nice ways of doing this.

> library(plyr)
> r2<-ddply(dat, .(factor), summarize, mean=mean(value))
> r2
  factor       mean
1      a  0.3877001
2      b -0.4079463
3      c -1.0837449
> subset(r2,factor=="a",select="mean")
       mean
1 0.3877001

You can also use dlply instead (which takes a dataframe and returns a list instead)

> dlply(dat, .(factor), summarize, mean=mean(value))$a
       mean
1 0.3877001

@Ben indeed, you can just modify the `ddply` call to `ddply(dat, .(factor, factor2), summarize, mean=mean(value))`, and this generalizes to more columns you want to "group" by. Hope that helps — JPC, Jan 15 '20 at 17:44

score 7 · Answer 2 · edited May 01 '14 at 08:59

7

The following code asks for the mean of value when factor = a:

mean(data$value[data$factor == "a"])

edited May 01 '14 at 08:59

Thomas

43,637
12
109
140

answered Apr 30 '14 at 20:33

Lenatis

71
1

perfect! I was looking exactly for this! in how to select a determined factor – Darwin PC Jun 25 '19 at 22:44

score 7 · Answer 3 · answered Mar 13 '17 at 14:10

7

Another simple possibilty would be the "by" function:

by(value, factor, mean)

You can get the mean of factor level "a" by:

factor_means <- by(value, factor, mean)
factor_means[attr(factor_means, "dimnames")$factor=="a"]

answered Mar 13 '17 at 14:10

Ruediger Ziege

320
3
18

how do I use the levels of a factor instead of the factor itself? – Ben Jan 10 '20 at 10:46

David Arenburg · Accepted Answer · 2020-04-15T18:24:09.033

6

Just for fun posting the data.table solution although you probably should do what @lukeA suggested

library(data.table) 
A <- setDT(df)[factor == "a", mean(value)]
## [1] 1.5

edited Apr 15 '20 at 18:24

answered Apr 30 '14 at 18:57

David Arenburg

91,361
17
137
196

6

What a truly bizarre programming language R is. – duhaime Nov 06 '18 at 01:34
2

@duhaime This is very silly way to do something very simple. I've posted this back when I just joined and was very rep hungry. If I could, I would delete this is answer all together. BTW, the solutions in the comments also look bizarre to you? Can you find something less bizzare than `aggregate(value~factor, FUN=mean)` in Python (not to mention Pandas copied everything from R). – David Arenburg Nov 06 '18 at 06:15
1

amen. Python doesn't have anything quite so cute as the aggregate function (which is pretty legible), but on the whole I find Python to be more expressive and easier to read. I find R is generally full of extremely terse statements, which while more compact than Python's syntax, are less easy to read off the page (at least for non-diehards). Reading a function in Python, one immediately sees how to translate it into any number of languages, but not so for R. That said, maybe I just need to drink the koolaid... – duhaime Nov 06 '18 at 13:05
1

@duhaime have you heard of the dplyr (or tidyverse) package in R? There is nothing more expressive than that in any language I believe. regarding Python, There are so many confusing stuff there like all these list comprehension shortcuts, numpy has the `np.reshape(-1,...` trick. You can exhaust the gorpuby in an iterator and so on. But I guess, this debate won't lead anywhere :) – David Arenburg Nov 06 '18 at 14:11

score 0 · Answer 5 · answered Feb 28 '22 at 12:05

0

You can use ddply and pass summary as the function.

library(plyr) # import library
ddply(nameOfTheDataframe, ~ factor, function(data) summary(data$value))

answered Feb 28 '22 at 12:05

noone

6,168
2
42
51

Mean by factor by level

5 Answers5

Linked

Related