Breaking the tapply junkie habit

Question

I've learned R by toying, and I'm starting to think that I'm abusing the tapply function. Are there better ways to do some of the following actions? Granted, they work, but as they get more complex I wonder if I'm losing out on better options. I'm looking for some criticism, here:

tapply(var1, list(fac1, fac2), mean, na.rm=T)

tapply(var1, fac1, sum, na.rm=T) / tapply(var2, fac1, sum, na.rm=T)

cumsum(tapply(var1, fac1, sum, na.rm=T)) / sum(var1)

Update: Here's some example data...

     var1    var2 fac1           fac2
1      NA  275.54   10      (266,326]
2      NA  565.89   10      (552,818]
3      NA  815.41    6      (552,818]
4      NA  281.77    6      (266,326]
5      NA  640.24   NA      (552,818]
6      NA   78.42   NA     [78.4,266]
7      NA 1027.06   NA (818,1.55e+03]
8      NA  355.20   NA      (326,552]
9      NA  464.52   NA      (326,552]
10     NA 1397.11   10 (818,1.55e+03]
11     NA  229.82   NA     [78.4,266]
12     NA  542.77   NA      (326,552]
13     NA  829.32   NA (818,1.55e+03]
14     NA  284.78   NA      (266,326]
15     NA  194.97   10     [78.4,266]
16     NA  672.55    8      (552,818]
17     NA  348.01   10      (326,552]
18     NA 1550.79    9 (818,1.55e+03]
19 101.98  101.98    4     [78.4,266]
20     NA  292.80    6      (266,326]

Update data dump:

structure(list(var1 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, 101.98, NA), var2 = c(275.54, 
565.89, 815.41, 281.77, 640.24, 78.42, 1027.06, 355.2, 464.52, 
1397.11, 229.82, 542.77, 829.32, 284.78, 194.97, 672.55, 348.01, 
1550.79, 101.98, 292.8), fac1 = c(10L, 10L, 6L, 6L, NA, NA, NA, 
NA, NA, 10L, NA, NA, NA, NA, 10L, 8L, 10L, 9L, 4L, 6L), fac2 = structure(c(2L, 
4L, 4L, 2L, 4L, 1L, 5L, 3L, 3L, 5L, 1L, 3L, 5L, 2L, 1L, 4L, 3L, 
5L, 1L, 2L), .Label = c("[78.4,266]", "(266,326]", "(326,552]", 
"(552,818]", "(818,1.55e+03]"), class = "factor")), .Names = c("var1", 
"var2", "fac1", "fac2"), row.names = c(NA, -20L), class = "data.frame")

Just as a comment: while these are clear examples, it would be easier to help if you provided sample data for var1, fac1, etc. — Shane, Sep 16 '09 at 17:36
Suggestion: could you use the dput() function to extract the structure of that sample data, and then paste the results here? Makes it a breeze to import. — Matt Parker, Sep 16 '09 at 18:32
An other idea is to use something from the "datasets" package which comes with R: ?datasets. Then no extra work is required for replication. — Shane, Sep 16 '09 at 18:40
I know I'm already in trouble if I can't even get the example right... added a dput of the example df. Keep in mind I'm unabashedly using attach() to get to the data in this scenario. — Totovader, Sep 16 '09 at 19:50

score 4 · Answer 1 · answered Sep 16 '09 at 20:19

4

For part 1 I prefer aggregate because it keeps the data in a more R-like one observation per row format.

aggregate(var1, list(fac1, fac2), mean, na.rm=T)

answered Sep 16 '09 at 20:19

Peter

1,155
8
20

Breaking the tapply junkie habit

1 Answers1