I have a data frame, cluster
, and one of the columns, cluster$Genes
, looks like this:
ENSG00000134684
ENSG00000188846, ENSG00000181163, ENSG00000114391
ENSG00000134684, ENSG00000175390
ENSG00000134684
ENSG00000134684, ENSG00000175390
...
The number of elements in each row in the column is arbitrary. I also have another data frame, expression
, that looks like this:
ENSGID a b
ENSG00000134684 1 3
ENSG00000175390 2 0
ENSG00000000419 131.23 108.73
ENSG00000000457 7.11 8.68
ENSG00000000460 15.70 6.59
ENSG00000000938 0 0
ENSG00000000971 0.03 0.07
ENSG00000001036 59.22 58.3
...
... and has around 20000 rows. What I want to do, is this:
- For all the elements in each row in the
cluster$Genes
, find the correspondinga
andb
values - Calculate the min, max and mean values of
a
andb
(separately) for each row incluster$Genes
- Create six new columns in the
cluster
data frame and fill them with the(min.a, max.a, mean.a, min.b, max.b, mean.b)
values
I've tried to find some way of doing this, but it's not going well. While googling for help I thought I might use some kind of apply
, and I got some code going. I think it's mostly gibberish and totally nonfunctional, and I'm kind of stuck. This is what I got:
exp.lookup = function(genes) {
genes.split = strsplit(genes, ', ')
exp.hct = list()
exp.hke = list()
for ( gene in genes.split ) {
exp.hct = c(exp.hct, merge(gene, means$hct, all.x=TRUE))
exp.hke = c(exp.hke, merge(gene, means$hke, all.x=TRUE))
return(c(exp.hct, exp.hke))
}
}
apply(cluster['Genes'], 1, FUN=exp.lookup)
Anybody got any better ideas, that might actually work?