I have a dataframe counts
(60,660 x 1246):
sample1 sample2 sample3 sample4 sample5
gene1 1615.75292 663.200093 2406.15320 836.38076 1217.8192
gene2 41.93247 8.602831 12.62244 60.14423 22.7755
gene3 697.97280 1198.139790 1033.46252 259.37201 695.9924
gene4 678.35922 1114.457703 1281.96687 466.11782 1265.3798
gene5 365.21832 726.548215 781.80257 268.76955 476.9457
I'm trying to find the list of genes per sample that fit within a certain threshold. For example, in order to find the genes that have a value greater than 1001, I can use counts > 1001
which gives me a TRUE/FALSE matrix:
sample1 sample2 sample3 sample4 sample5
gene1 TRUE FALSE TRUE FALSE TRUE
gene2 FALSE FALSE FALSE FALSE FALSE
gene3 FALSE TRUE TRUE FALSE FALSE
gene4 FALSE TRUE TRUE FALSE TRUE
gene5 FALSE FALSE FALSE FALSE FALSE
Which I then pass to apply(true_false_matrix, 2, which) %>% lapply(\(x) names(x))
to get a list of the genes per sample that have a value greater than 1001. I would also like to find genes whose value is in between a certain range. For example, I tried to do:
1 < counts && counts < 5
But all I got was a single value of FALSE
I know that there are genes meeting this requirement so I think I'm going about finding them in the wrong way. Is there a way to get a TRUE/FALSE matrix from my initial dataframe but with 2 conditions?