This is my first time posting so apologies in advance if not completely clear. I have a challenge task (!!not coursework!!) with a large data set of gene ids (column 1) and expression levels (columns 40-47). I am writing a function that returns an output if the standard deviation is larger than the mean. So far I have been able to print the results, but I want to print the names of the genes if there are no FALSE outputs for that row.
I am also getting a warning message and can't figure out why.
Please help! (z) is the data.frame that I will call after.
> getHighlyVariableGenes <- function(z) {
for (i in z[40:47]) {
if (output <- sapply(z[,40:74], sd) > rowMeans(z[,40:74])){
return(output)
} else {
return("")
}
}
}
> getHighlyVariableGenes(RNA_data)
Which gives me:
[947] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[958] TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE
[969] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[980] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[991] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[ reached getOption("max.print") -- omitted 62652 entries ]
Warning messages:
1: In sapply(z[, 40:74], sd) > rowMeans(z[, 40:74]) :
longer object length is not a multiple of shorter object length
2: In if (output <- sapply(z[, 40:74], sd) > rowMeans(z[, 40:74])) { :
the condition has length > 1 and only the first element will be used
I have also tried:
>getHighlyVariableGenes <- function(z) {
for (i in z[40:47]) {
if (output <- sapply(z[,40:74], sd) > rowMeans(z[,40:74])){
return(z['gene_id'])
} else {
return("")
}
}
}
Which seemingly prints every value in the first column and isn't emitting the FALSE outputs:
>995 ENSG00000066735
>996 ENSG00000066739
>997 ENSG00000066777
>998 ENSG00000066813
>999 ENSG00000066827
>1000 ENSG00000066855
[ reached getOption("max.print") -- omitted 62652 rows ]
Any suggestion is greatly appreciated!