I have a dataframe called all_genes
that has 157 columns in total, the first column being a genes
column containing gene names. The columns of interests are from 50th to 157th with 2-step (50
, 52
, 54
, 56
, etc ...) which are the sample's names. These columns have three types of values: 1
, 2
or 3
, knowing that for the same row (same gene), we could have the three types of values for different samples.
For example, the row of gene X
has a value of 1
in column 50th column but value of 2
for 52nd column.
What I wish is to extract all rows from the even columns depending on these values. To get a better idea, here's how the dataframe looks like:
Now, I have written this code to extract, for example, rows of value 1
:
# extracting rows of value "1" from column 50 to 157, by taking into account only the even columns
df <- all_genes[which(all_genes[, seq(50, 157, 2)] == 1), ]
# removing NAs if all the rows are NAs from columns 50 to 157
df <- df[rowSums(is.na(df[, 50:157])) != ncol(df[, 50:157]), ]
However, what I get is the following:
As you can see, the first column contains values that are all equal to 1
but if you look at other columns, you see values of 2
(and 3
). I think my code is only looking at the fiftieth column and neglecting the possibility to get different values than 1 for the 50th column because for the same gene, we can have a value of 2
in the 50th column but 1
for the 52nd column. To confirm that, I checked the possibility (please copy-paste the following link since I don't have enough reputation):
i.stack.imgur.com/rZQ2E.png
Could you please tell me if my code is working correctly or should I change something ?
The same thing happens if I change in my code the value of 1
to 2
. I will still get values of 2
in the 50th column but all kind of values in the other columns.
Thanks in advance.
EDIT As requested by @tobiasegli_te, here's a small reproducible dataframe:
structure(list(`#00e41e6a-9fe7-44f9-978b-7b05b179506a` = c(1,
1, NA, NA, NA, NA, NA, NA, NA, 1, 2, 1, NA, NA, 2, NA, 3, 1,
1, NA, NA, NA, 2, NA, 1, NA, NA, NA, NA, 1, 1, 1, NA, NA, 1,
NA, NA), `#aca312ab-6dbd-4183-8b22-8f37834f3426` = c(NA, NA,
NA, 1, NA, 1, NA, 2, 1, NA, 2, 1, 1, 1, NA, NA, NA, 1, 1, 1,
NA, 1, 2, 1, NA, 1, NA, 1, NA, NA, 1, NA, 1, NA, 1, 1, 1), `#0730216b-c201-443c-9092-81e23fd13c6c` = c(NA,
NA, NA, NA, NA, NA, 2, NA, NA, NA, NA, NA, 1, NA, NA, 1, NA,
NA, NA, NA, 1, NA, NA, NA, NA, NA, 2, NA, NA, NA, NA, NA, NA,
2, 1, NA, NA), `#acd5ceef-c5cf-4e95-9394-c50fdbc70c8d` = c(NA,
NA, 2, NA, 2, NA, 2, NA, NA, NA, NA, NA, NA, NA, NA, NA, 2, NA,
NA, NA, NA, 1, NA, NA, NA, NA, NA, NA, 1, NA, 1, NA, NA, NA,
1, NA, NA)), .Names = c("#00e41e6a-9fe7-44f9-978b-7b05b179506a",
"#aca312ab-6dbd-4183-8b22-8f37834f3426", "#0730216b-c201-443c-9092-81e23fd13c6c",
"#acd5ceef-c5cf-4e95-9394-c50fdbc70c8d"), row.names = c(1L, 2L,
4L, 6L, 8L, 10L, 11L, 16L, 20L, 22L, 23L, 30L, 32L, 37L, 38L,
43L, 45L, 46L, 47L, 49L, 50L, 53L, 62L, 64L, 65L, 67L, 68L, 69L,
70L, 71L, 73L, 74L, 76L, 77L, 79L, 80L, 81L), class = "data.frame")