0

This is my first time posting so apologies in advance if not completely clear. I have a challenge task (!!not coursework!!) with a large data set of gene ids (column 1) and expression levels (columns 40-47). I am writing a function that returns an output if the standard deviation is larger than the mean. So far I have been able to print the results, but I want to print the names of the genes if there are no FALSE outputs for that row.

I am also getting a warning message and can't figure out why.

Please help! (z) is the data.frame that I will call after.

> getHighlyVariableGenes <- function(z) {
  for (i in z[40:47]) {
    if (output <- sapply(z[,40:74], sd) > rowMeans(z[,40:74])){
    return(output)
    } else {
    return("")
    }
    }
  }

> getHighlyVariableGenes(RNA_data)

Which gives me:

 [947]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
 [958]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE  TRUE  TRUE  TRUE
 [969]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
 [980]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
 [991]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
 [ reached getOption("max.print") -- omitted 62652 entries ]
Warning messages:
1: In sapply(z[, 40:74], sd) > rowMeans(z[, 40:74]) :
  longer object length is not a multiple of shorter object length
2: In if (output <- sapply(z[, 40:74], sd) > rowMeans(z[, 40:74])) {     :
  the condition has length > 1 and only the first element will be used

I have also tried:

>getHighlyVariableGenes <- function(z) {
  for (i in z[40:47]) {
    if (output <- sapply(z[,40:74], sd) > rowMeans(z[,40:74])){
    return(z['gene_id'])
    } else {
    return("")
  }
  }
}

Which seemingly prints every value in the first column and isn't emitting the FALSE outputs:

>995   ENSG00000066735
>996   ENSG00000066739
>997   ENSG00000066777
>998   ENSG00000066813
>999   ENSG00000066827
>1000  ENSG00000066855
[ reached getOption("max.print") -- omitted 62652 rows ]

Any suggestion is greatly appreciated!

  • Welcome to Stack Overflow! A couple of things here, first it's a bit tricky to help without a reproducible example and an expected output, take a look at this post and see if you can edit your question to make it easier for others to help. – DS_UNI Apr 18 '19 at 12:21
  • Sorry this this the link to the post I mentioned earlier https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example Second you have a couple of problems there in your code, you have an assignment inside the condition of the if statement (not sure why), additionally the condition should be either TRUE or FALSE and not a logical vector, and that is what's causing the second warning – DS_UNI Apr 18 '19 at 12:25
  • As for the first warning it's telling you that the length of `sapply(z[, 40:74], sd)` is not compatible with the length of `rowMeans(z[, 40:74])`, and it is indeed the case since `sapply` in this case will be applied per column, and `rowMeans` per row, and I'm guessing that the number of rows does not equal the number of columns in your data – DS_UNI Apr 18 '19 at 12:28

0 Answers0