1

I have a data frame with something like the following structure:

Trial Index    Condition1    Condition2    Measures
1              A             Y             ...
2              A             Y             ...        
3              B             Y             ...
4              B             Y             ...
5              A             Z             ...
6              A             Z             ...        
7              B             Z             ...
8              B             Z             ...

I would like to compute a number of summary measures on each combination of Condition1 and Condition2, and for the margins. I can use multiple calls to ddply to do this, but I was wondering if there is some simple way to get a single data structure out of it, something like:

Condition1    Condition2    Mean    Median    ....
A             Y             ...     ...       ....
A             Z             ...     ...       ....
A             -             ...     ...       ....             
B             Y             ...     ...       ....
B             Z             ...     ...       ....
B             -             ...     ...       ....
-             Y             ...     ...       ....
-             Z             ...     ...       ....
Nathan
  • 340
  • 2
  • 11
  • That's more or less what I already have. What I want out of it is for the function to compute the margins as well, e.g. the mean and median of _all_ the samples in Condition1. The above code only gives the mean and median for each combination of Condition1 and Condition2. – Nathan Jan 09 '13 at 17:38
  • That would be the mean of all samples. I would like the mean of all samples in Condition1. That itself is rather simple, but what I would like is if I could get each mean (mean of all samples, mean of all samples in condition1, mean of all samples in condition2, means of samples in each combination of condition1 and condition2) in the same data frame, with code as parsimonious as possible. – Nathan Jan 09 '13 at 18:39
  • That's what I thought. Thanks for sticking through it anyways! – Nathan Jan 09 '13 at 19:58

2 Answers2

3

Complaints have been issued over the years about R's difficulties with "reporting". There really are not built-in functions for inserting subtotals (or sub-means) and grand totals within tables. The interfaces to the SQL drivers could provide some of the remedy, but I wouldn't call that simple and since you are not using database concepts to pose your question, I'm guessing that's not good for you. This is an all-base-R approach using sums as the outcome from a prior answer:

R: calculating column sums & row sums as an aggregation from a dataframe

There might be an avenue for progress if you constructed an array with marginals and then "flattened" it withftable. See here:

Grouping and Sorting in R

There is the tables package by Duncan Murdoch. That is probably the closest I can come to an answer. But I think the answer to the specific question "is there some simple way" to get an R-object with the complexity requested is ... no ... at least about which I am aware.

Community
  • 1
  • 1
IRTFM
  • 258,963
  • 21
  • 364
  • 487
3

@DWin is right, tables package might be the right clue here. Without taking care of formating here's an example:

library(tables)
d1 <- data.frame(id = 1:10, c1 = sample(c("a","b"), 10, replace = TRUE), 
        c2 = sample(c("c", "d"), 10, replace = TRUE), measures = rnorm(1:10))
t1 <- tabular((c1 + c2 + c1*c2 +1) ~ (measures)*(mean + median), data = d1)

             measures        
             mean     median 
      c1 a   -0.33306 -0.1801
         b   -0.54121 -0.6381
      c2 c   -0.04862  0.1647
         d   -0.69615 -0.8129
 c1 a c2 c   -0.26195 -0.2619
         d   -0.38047 -0.1801
    b    c    0.16472  0.1647
         d   -1.01182 -1.1863
         All -0.43713 -0.4678

It takes a while to get into the syntax though; on the plus side it provides functionality to export the tables to LaTeX. If you don't want/need all the labeling in that tabular object you can extract the values via as.matrix(t1, format = as.numeric).

NOTE: c1 and c2 on the left hand side of the formula have to be factor for this to work

adibender
  • 7,288
  • 3
  • 37
  • 41