stackoverflow newbie here... I have read lots of aggregate(), by() and tapply() guidances but didn't find answer.
Using the example in R help page(warpbreaks is a data set in R),
> aggregate(breaks ~ wool + tension, data = warpbreaks, mean)
wool tension breaks
1 A L 44.55556
2 B L 28.22222
3 A M 24.00000
4 B M 28.77778
5 A H 24.55556
6 B H 18.77778
But how should I code if I also need the result of all supersets (like row 7 to 10 below)?
wool tension breaks
1 A L 44.55556
2 B L 28.22222
3 A M 24.00000
4 B M 28.77778
5 A H 24.55556
6 B H 18.77778
7 A - #mean of the set that wool=A, but no restriction to tension
8 B -
9 - L #mean of the set that tension=L, but no restriction to wool
10 - - #mean of the whole set in data frame
It is also okay if you have methods without using aggregate function. Thanks a lot!
Hi all, thanks for your answers! Actually I have 40+ subsets, and 200+ variables to calculate (not only one variable "breaks" in example). Thus I find it inefficient to use tapply
or aggregate(breaks ~ tension, data = warpbreaks, mean)
and then merge results. Plz tell me if there are better ways for data manipulation in this case!