calculate a mean by criteria in R

Question

I would like to calculate a sample mean in R by introducing a specific criteria. For example I have this table and I want the means of only those for whom stage = 1 or 2:

treatment session period stage wage_accepted type 
1            1      1     1            25  low 
1            1      1     3            19  low 
1            1      1     3            15  low 
1            1      1     2            32 high 
1            1      1     2            13  low 
1            1      1     2            14  low 
1            1      2     1            17  low 
1            1      2     4            16  low
1            1      2     5            21  low

The desired out in this case should be:

   stage  mean
      1  21.0 
      2  19.6667

Thanks in advance.

score 4 · Accepted Answer · answered Apr 19 '15 at 00:17

4

With the dplyr library

library(dplyr)

df %>% filter(stage==1 | stage ==2) %>% group_by(stage) %>%
  summarise(mean=mean(wage_accepted))

If you are new to dplyr a bit of explanation:

Take the data frame df then filter where stage is equal to 1 or 2. Then for each group in stage calculate the mean of the wage_accepted

answered Apr 19 '15 at 00:17

dimitris_ps

5,849
3
29
55

Thanks, it's useful. However my data is really big in fact and the above is just an example. I would like to choose 25 answers of a variable which has 50. In this case (filter stage==1 | .... | stage == 25) would be a little bit long. How can I do it more efficiently? – rado Apr 19 '15 at 00:24
Use `filter(stage %in% 1:25)` – dimitris_ps Apr 19 '15 at 00:25
it is in qualitative not in quantitative. The answers are for example 'A', 'B', 'C' and so on... – rado Apr 19 '15 at 00:27
1

Yeap, you got the logic! – dimitris_ps Apr 19 '15 at 00:29

score 3 · Answer 2 · edited Apr 19 '15 at 01:53

Assuming you have a csv file for the data, you can read data into a data frame using:

data<-read.csv("PATH_TO_YOUR_CSV_FILE/Name_of_the_CSV_File.csv")

Then you can use either this code relying on sapply():

sapply(split(data$Wage_Accepted,data$Stage),mean)

   1        2        3        4        5 
21.00000 19.66667 17.00000 16.00000 21.00000

Or this code relying on tapply():

tapply(data$Wage_Accepted,data$Stage,mean)

   1        2        3        4        5 
21.00000 19.66667 17.00000 16.00000 21.00000

miles2know · Answer 3 · 2015-04-19T01:02:40.343

2

Check this out. It's a toy example, but data.table is so compact. dplyr is great as well obviously.


    library(data.table)

    dat <- data.table(iris)
    dat[Species == "setosa" | Species == "virginica", mean(Sepal.Width), by = Species]

In terms of your need for speed... data.table is a rocket ship look it up. I'll leave it to you to apply this to your question. Best, M2K

edited Apr 19 '15 at 01:02

answered Apr 19 '15 at 00:56

miles2know

737
8
17

Veerendra Gadekar · Answer 4 · 2015-04-19T00:22:38.480

0

You can do this and then later filter for Stages as per your requirement

# Calculating mean with respect to stages
df = do.call(rbind, lapply(split(data, f = data$stage),function(x) out = data.frame(stage = unique(x$stage), mean = mean(x$wage_accepted))))

# mean for stage 1 and 2
required = subset(df, stage %in% c(1,2))

edited Apr 19 '15 at 00:22

answered Apr 19 '15 at 00:14

Veerendra Gadekar

4,452
19
24

calculate a mean by criteria in R

4 Answers4

Linked