0

I have three parameters (3 columns)

x <- c(1, 1, 2, 2, 2, 2, 1, 1, 2) 
y <- c(1, 1, 1, 2, 2, 2, 3, 3, 3) 

and

 z <- c(10, NA, 16, 25, 41, NA, 17, 53, 26)

I need for each y calculate the mean of column z, where x==1

How can I do it using the aggregate function in R?

data <- data.frame(x=c(1, 1, 2, 2, 2, 2, 1, 1, 2), 
                   y=c(1, 1, 1, 2, 2, 2, 3, 3, 3), 
                   z=c(10, NA, 16, 25, 41, NA, 17, 53, 26))

data
  x y  z
1 1 1 10
2 1 1 NA
3 2 1 16
4 2 2 25
5 2 2 41
6 2 2 NA
7 1 3 17
8 1 3 53
9 2 3 26
jbaums
  • 27,115
  • 5
  • 79
  • 119
user36363
  • 3
  • 5

2 Answers2

2

Here's one way of going about it, using tapply:

with(data, tapply(z, list(x==1, y), mean, na.rm=TRUE)['TRUE', ])

#  1  2  3 
# 10 NA 35

More generally, to apply an arbitrary function to groups where x==1, and return NA for groups that don't have x==1, we can use aggregate and merge:

merge(aggregate(z~y, data[data$x==1,], function(x) {
 c(mean=mean(x, na.rm=TRUE), quantile(x, na.rm=TRUE))
}), list(y=unique(data$y)), all=TRUE)

#   y z.mean z.0% z.25% z.50% z.75% z.100%
# 1 1     10   10    10    10    10     10
# 2 2     NA   NA    NA    NA    NA     NA
# 3 3     35   17    26    35    44     53
jbaums
  • 27,115
  • 5
  • 79
  • 119
  • Thanks! But can I use function quantile together with mean and write data (mean and quartiles) to csv file? – user36363 Jun 14 '14 at 02:09
  • You can do anything if you put your mind to it. ;) – jbaums Jun 14 '14 at 02:10
  • I mean Is it possible to do it in one line? – user36363 Jun 14 '14 at 02:15
  • 1
    @user36363, Have you tried to do any of this yourself? That is totally different from your original question. – Rich Scriven Jun 14 '14 at 02:16
  • @user36363: I've answered your additional question. In future though, include all relevant information in the question itself, and if you have a separate question, treat it as such rather than updating your question in comments. Also, as Richard pointed out, this is very likely a duplicate, not to mention that `?aggregate` should have made solving your original question pretty trivial. – jbaums Jun 14 '14 at 02:34
  • OK. Thanks for explanaiton – user36363 Jun 14 '14 at 02:44
  • @jbaums: But the ?aggregate alone didn`t help me, since the other functions like merge and aggregate are used there. Could you refer me to the books about that information, please? What books should I learn to come up with your last formula by myself? – user36363 Jun 14 '14 at 02:50
  • @jbaums, Hi, one more question, please. I am just interesting how is it possible for example along with calculations of means and quartiles include one more column W and calculate sum of W , for which each Y, where X=1? I mean is it possible to do in one line? – user36363 Jun 14 '14 at 04:07
  • @user36363: Modify the function (i.e. this bit: `c(mean=mean(x, na.rm=TRUE), quantile(x, na.rm=TRUE))`). There's a clear pattern there. To include the mean in the output vector we do `mean(x, na.rm=TRUE)`. To include the default quantiles we use `quantile(x, na.rm=TRUE)`. To include the sum, we use... I'll leave this exercise to you. – jbaums Jun 14 '14 at 04:29
  • @jbaums, You described the calculation of the mean and quantiles of column Z. But I mean to consider one more column W and calculate the sum. And to it along with previous calculations. – user36363 Jun 14 '14 at 04:33
  • @user36363: Ok, do a separate simple `aggregate` or `tapply` (see my first chunk of code) over `W`, and `cbind` the output to the `z` summaries that you generate with my second chunk of code. Please don't keep commenting on this post. If you have new questions, post them as new questions (or find existing SO posts that address your question). All solutions on this post already fully address the present question. – jbaums Jun 14 '14 at 04:38
  • @jbaums Ok, Thank you! Just last question about the R textbook which you could recommend me? – user36363 Jun 14 '14 at 04:48
  • @user36363 See [here](http://cran.r-project.org/manuals.html) and [here](http://cran.r-project.org/other-docs.html) – jbaums Jun 14 '14 at 04:53
1

Here is another one liner with aggregate for the sake of golf.

aggregate(z~y, within(data, z <- ifelse(x==1,z,NA)), mean, na.rm=TRUE, na.action=na.pass)

It is suboptimal, and it returns NaN instead of NA for y==2 as does mean(numeric(0)).

mlt
  • 1,595
  • 2
  • 21
  • 52
  • 1
    since we're playing golf, how bout this: `merge(aggregate(z~y, d[d$x==1,], mean), list(y=1:3), all=T)`? – jbaums Jun 14 '14 at 02:24
  • 1
    @jbaums nice! I'd use also unique(data$y) instead of hard coded 1:3 :-) – mlt Jun 14 '14 at 02:27