I'm having trouble getting R's aggregate()
function to return a data.frame in the format that I'd like.
Basically I run the aggregation like so:
aggregate(df$res, list(full$depth), summary)
where the res
column contains TRUE
, FALSE
and NA
. I want to calculate the number of times each value of res
occurs according to the groups in depth
, which are six numeric depth values 0, 5, 15, 30, 60 and 100. According to the help page on the aggregate function it coerces the by values to factors, so this oughtn't be a problem (as far as I can tell).
So I run the aggregate function and store it in a data.frame. This is fine; it runs without error. The summary displayed in the R console looks like this:
Group.1 x.Mode x.FALSE x.TRUE x.NA's
1 0 logical 3 83 0
2 5 logical 3 83 0
3 15 logical 8 78 0
4 30 logical 5 79 2
5 60 logical 1 64 21
6 100 logical 1 24 61
Again, this is fine, and looks like what I want. But the data.frame containing the results actually has only two columns, and looks like this:
Group.1 x
1 0 logical
2 5 logical
3 15 logical
4 30 logical
5 60 logical
6 100 logical
7 3
8 3
9 8
10 5
11 1
12 1
13 83
14 83
15 78
16 79
17 64
18 24
19 0
20 0
21 0
22 2
23 21
24 61
I understand from the aggregate()
help page that:
If the
by
has names, the non-empty times are used to label the columns in the results, with unnamed grouping variables being namedGroup.i
forby[[i]]
.
which suggests to me that if the by
has names then the output data.frame would look more like the summary of it that gets printed to the R console (i.e. it'd have 5 columns including a column of counts for each level in by
) than the two-column version it actually gets saved as. The trouble is that the help page doesn't explain at all what a named by
variable is, especially if it's coerced to a list from a data.frame column as in my case.
What do I need to do differently in order for the data.frame that results from aggregate()
to have a column of counts for each level of by
as the help suggests it could if I knew what I was doing?