I would like to aggregate a data.frame
by an identifier variable called ensg
. The data frame looks like this:
chromosome probeset ensg symbol XXA_00 XXA_36 XXB_00
1 X 4938842 ENSMUSG00000000003 Pbsn 4.796123 4.737717 5.326664
I want to compute the mean for each numeric column over rows with same ensg
value. The problem here is that I would like to leave the other identity variables chromosome and symbol untouched as they are also the same for same ensg
.
In the end I would like to have a data.frame
with identity columns chromosome
, ensg
, symbol
and mean of numeric columns over rows with same identifier. I implemented this in ddply
, but it is very slow when compared to aggregate
:
spec.mean <- function(eset.piece)
{
cbind(eset.piece[1,-numeric.columns],t(colMeans(eset.piece[,numeric.columns])))
}
t
mean.eset <- ddply(eset.consensus.grand,.(ensg),spec.mean,.progress="tk")
My first aggregate implementation looks like this,
mean.eset=aggregate(eset[,numeric.columns], by=list(eset$ensg), FUN=mean, na.rm=TRUE);
and is much faster. But the problem with aggregate
is that I have to reattach the describing variables. I have not figured out how to use my custom function with aggregate
since aggregate
does not pass data frames but only vectors.
Is there an elegant way to do this with aggregate
? Or is there some faster way to do it with ddply
?