i have one data frame that has repetitive lines. I want to remove repetitive rows and select the row for each sample_id that is col with the highest value of each count. How can i do that?
Sample data (from the comments):
structure(list(gene_id = c("ENSG00000000003", "ENSG00000000003",
"ENSG00000000003", "ENSG00000000003", "G00000000003", "G00000000003",
"G00000000003", "G00000000003", "G00000000003", "G00000000003"
), DO221539 = c(681L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), DO221540 = c(148L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), DO221541 = c(650L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L), DO221542 = c(258L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L), DO221543 = c(57L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L), DO221544 = c(224L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L), DO221545 = c(60L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), DO221546 = c(161L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), DO224575 = c(15L, 0L, 0L,
0L, 0L, 949L, 0L, 0L, 0L, 0L)), class = "data.frame", row.names = c(NA,
-10L))
i want the out put to be
structure(list(gene_id = c("ENSG00000000003") ,DO221539 = 681L,DO221540 = 148L ,DO221541 = 650L, DO221542 = 258L , DO221543 = 57L, DO221544 = 224L, DO221545 = 60L, DO221546 = 61L, DO224575 = 949L, class = "data.frame", row.names = c(NA, -10L))