Filtering data frame with Matrix column in R

Question

I have a very simple data frame that has a ID column and a column that indicates if the row was a true positive or not (either 1 or 0). I aggregated the data by ID using plyr's each function and calculated the number of occurrences for the ID and the mean value for true positive usingagg <- aggregate(tp ~ v_id, data, each(mean, length)) That seemed to have worked well and I got the following data:

head(agg)
                v_id tp.mean tp.length
1             A51599     1.0         4
2             A51670     1.0         2
3             A51672     1.0         2
4             A51676     1.0         2
5             A51677     1.0         2
6             A51678     0.5         2

That data is nice, but now I would like to filter out all rows where the tp.length is less than 100. I tried all kinds of things with the subset function as well with the '[]' operator with conditions in it. The tp column seems to be a matrix and I have no idea, how to get to the tp.length in the filter.

Thank you!

`do.call(data.frame, agg)` to get it into a more standard form — user20650, Mar 01 '15 at 03:10

bjoseph · Accepted Answer · 2015-03-01T03:24:30.993

Using the warpbreaks data for easy reproducibility:

data(warpbreaks)
agg<-aggregate(breaks ~ wool + tension, data = warpbreaks, mean)
head(agg)
wool tension   breaks
1    A       L 44.55556
2    B       L 28.22222
3    A       M 24.00000
4    B       M 28.77778
5    A       H 24.55556
6    B       H 18.77778

agg<-agg[agg$breaks<44,]
head(agg)
  wool tension   breaks
2    B       L 28.22222
3    A       M 24.00000
4    B       M 28.77778
5    A       H 24.55556
6    B       H 18.77778

Hadley Wickham has a good chapter on subsetting here.http://adv-r.had.co.nz/Subsetting.html

You may also want to check that your column is integer or numeric by calling class(agg$tp.length)

Edit: The below comment is totally right -- when you pass 2 or more functions to a data.frame it produces columns of class matrix. These can be subset several ways:

agg = aggregate(mpg ~ am , mtcars, function(i) c(mean(i), sd(i))) 
head(agg)
  am     mpg.1     mpg.2
1  0 17.147368  3.833966
2  1 24.392308  6.166504
str(agg)
'data.frame':   2 obs. of  2 variables:
 $ am : num  0 1
 $ mpg: num [1:2, 1:2] 17.15 24.39 3.83 6.17
class(agg[,2])
[1] "matrix"

You can set the individual matrix columns to columns in your data.frame and then subset my original answer.

agg$mpg1<-agg[,2][,1]
agg$mpg2<-agg[,2][,2]

head(agg)
  am     mpg.1     mpg.2     mpg1     mpg2
1  0 17.147368  3.833966 17.14737 3.833966
2  1 24.392308  6.166504 24.39231 6.166504

The problem is that two functions were passed to the `aggregate` function. This produces columns with a `matrix` class. For example, try `agg = aggregate(mpg ~ am , mtcars, function(i) c(mean(i), sd(i)))` and then look at `str(agg)`. You cannot subset it by standard methods - try `agg$mpg.1` — user20650, Mar 01 '15 at 03:13
`do.call(data.frame, agg)` would "flatten out" your `data.frame`. See the linked duplicate question. — A5C1D2H2I1M1N2O1R2T1, Mar 01 '15 at 06:58

Filtering data frame with Matrix column in R

1 Answers1