-2

I have four vectors (columns)

 x y z  t
 1 1 1 10
 1 1 1 15
 2 4 1 14
 2 3 1 15
 2 2 1 17
 2 1 2 19
 1 4 2 18
 1 4 2 NA
 2 2 2 45
 3 3 2 NA
 3 1 3 59
 4 3 3 23
 1 4 3 45
 4 4 4 74
 2 1 4 86

How can I calculate mean and median of vector t, for each value of vector y (from 1 to 4) where x=1, z=1, using aggregate function in R?

It was discussed how to do it with 3 parameters (Multiple Aggregation in R) but it`s a little unclear how to do it with 4 parameters.

Thank you.

Community
  • 1
  • 1
user36363
  • 3
  • 5
  • The biggest problem with trying to use aggregate is that you want different aggregation functions (mean and median). `aggregate()` only really uses one at a time. – MrFlick Jun 15 '14 at 03:32
  • 3
    @MrFlick: not really - `aggregate(t~y, data=subset(d, x==1 & z==1), function(x) c(mean=mean(x), median=median(x)))` – jbaums Jun 15 '14 at 16:07
  • 1
    @jbaums - the downside of that of course being that you end up with a matrix inside your dataframe containing columns `mean` and `median`. The structure ends up a bit twisted. Nothing `do.call(data.frame,output)` can't fix, but still a bit of a nuisance. – thelatemail Jul 11 '14 at 00:57

1 Answers1

1

You could try something like this in data.table

data <- data.table(yourdataframe)

bar <- data[,.N,by=y]
foo <- data[x==1 & z==1,list(mean.t=mean(t,na.rm=T),median.t=median(t,na.rm=T)),by=y]
merge(bar[,list(y)],foo,by="y",all.x=T)

   y mean.t median.t
1: 1   12.5     12.5
2: 2     NA       NA
3: 3     NA       NA
4: 4     NA       NA

You probably could do the same in aggregate, but I am not sure you can do it in one easy step.

An answer to to an additional request in the comments...

bar <- data.table(expand.grid(y=unique(data$y),z=unique(data[z %in% c(1,2,3,4),z])))
foo <- data[x==1 & z %in% c(1,2,3,4),list(
  mean.t=mean(t,na.rm=T),
  median.t=median(t,na.rm=T),
  Q25.t=quantile(t,0.25,na.rm=T),
  Q75.t=quantile(t,0.75,na.rm=T)
  ),by=list(y,z)]
merge(bar[,list(y,z)],foo,by=c("y","z"),all.x=T)

    y z mean.t median.t Q25.t Q75.t
 1: 1 1   12.5     12.5 11.25 13.75
 2: 1 2     NA       NA    NA    NA
 3: 1 3     NA       NA    NA    NA
 4: 1 4     NA       NA    NA    NA
 5: 2 1     NA       NA    NA    NA
 6: 2 2     NA       NA    NA    NA
 7: 2 3     NA       NA    NA    NA
 8: 2 4     NA       NA    NA    NA
 9: 3 1     NA       NA    NA    NA
10: 3 2     NA       NA    NA    NA
11: 3 3     NA       NA    NA    NA
12: 3 4     NA       NA    NA    NA
13: 4 1     NA       NA    NA    NA
14: 4 2   18.0     18.0 18.00 18.00
15: 4 3   45.0     45.0 45.00 45.00
16: 4 4     NA       NA    NA    NA
Mike.Gahan
  • 4,565
  • 23
  • 39
  • Thank you, I tried to do that but got the error: unused argument (by = y) – user36363 Jun 15 '14 at 03:41
  • Did you make it into a ```data.table``` object first? – Mike.Gahan Jun 15 '14 at 03:45
  • Do you want when x=1 AND where z=1..or do you want do you want when x=1 OR when y=1? That could be the difference. If not, you are going to have to post the desired output because it is unclear to me. – Mike.Gahan Jun 15 '14 at 04:04
  • It works now, I want to do it for x=1 AND z=1. And under these conditions y can be equal only to 1 and I get one row in results for where y=1, But how can I get results for each y, for example if y=2,3,4 I need get NA in results. – user36363 Jun 15 '14 at 04:10
  • Thank you. That`s good. I`m just beginner in R and wondering if it is possible to do the following: we got for rows for z=1 but can we do another 4 rows for z=2,3 and 4 and to it in one formula? to get 16 rows in result – user36363 Jun 15 '14 at 04:59
  • just added it to the answer – Mike.Gahan Jun 15 '14 at 11:52
  • Thank you! In that example you calculate mean and median, but is it possible to calculate quantiles as well? I tried to do it but got the error – user36363 Jun 15 '14 at 15:24
  • 2
    yeah..adding quantiles or standard deviations is easy. see above. please mark this as answered. – Mike.Gahan Jun 15 '14 at 16:01
  • Hello Mike. I like your method and I used it a lot. But I have one more question. We have the same data and we need to make groups by Y and Z, but not for each number from Z, but split Z by groups: if 0 < z <= 2 group I, if 2 < z <= 3 group II and if 3 < z <= 4 group III. And do all the same calculations but for those three groups. Thank you! – user36363 Jul 11 '14 at 14:11
  • Just change the ```by``` statement slightly. Instead of ```by=list(y,z)]```, use ```by=list(y,z.grp=cut(z,c(-Inf,2,3,Inf))]``` – Mike.Gahan Jul 12 '14 at 16:48
  • Hi Mike. Thanks for your help, Your approach is the most simple, I like it. I have one more question nobody can answer, please take a look at it, thanks in advance: http://stackoverflow.com/questions/24697468/conditional-calculating-the-numbers-of-values-in-column-with-r – user36363 Jul 12 '14 at 19:49