How to select highest value of col in r

Question

i have one data frame that has repetitive lines. I want to remove repetitive rows and select the row for each sample_id that is col with the highest value of each count. How can i do that?

Sample data (from the comments):

structure(list(gene_id = c("ENSG00000000003", "ENSG00000000003", 
"ENSG00000000003", "ENSG00000000003", "G00000000003", "G00000000003", 
"G00000000003", "G00000000003", "G00000000003", "G00000000003"
), DO221539 = c(681L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), DO221540 = c(148L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), DO221541 = c(650L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L), DO221542 = c(258L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L), DO221543 = c(57L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L), DO221544 = c(224L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L), DO221545 = c(60L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), DO221546 = c(161L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), DO224575 = c(15L, 0L, 0L, 
0L, 0L, 949L, 0L, 0L, 0L, 0L)), class = "data.frame", row.names = c(NA, 
-10L))

i want the out put to be

structure(list(gene_id = c("ENSG00000000003") ,DO221539 = 681L,DO221540 = 148L ,DO221541 = 650L, DO221542 = 258L , DO221543 = 57L, DO221544 = 224L, DO221545 = 60L, DO221546 = 61L, DO224575 = 949L, class = "data.frame", row.names = c(NA, -10L))

I have 816541 ob of 325 variable. Each col is sample_id and each row is gene_name. For each row(gene) each sample has the greatest value in the middle of repetitive values, i need to pick up them. — Anna, Oct 30 '18 at 00:37
Yes, i want to remove duplicates and select the highest value for each sample based on genes( 1 gene with highest value for sample) — Anna, Oct 30 '18 at 00:40
From your comment, I am guessing that you need per row? It would be better if you show a small example in your post with expected output Try `library(dplyr); df1 %>% group_by(sample_id) %>% summarise_all(max)` — akrun, Oct 30 '18 at 00:41
as it is a bigger dataset, you can also use `data.table` `library(data.table); setDT(df1)[, lapply(.SD, max), by = sample_id]` — akrun, Oct 30 '18 at 00:43
Sorry, I have the .txt file for head of the data frame. How can i attach a data frame from my computer? — Anna, Oct 30 '18 at 00:50
Refs: https://stackoverflow.com/questions/5963269, https://stackoverflow.com/help/mcve, and https://stackoverflow.com/tags/r/info. One example: `dput(head(x,n=20))`. — r2evans, Oct 30 '18 at 00:50
@Anna If you have read the data into R, use `dput` to show a small example, i.e. `dput(droplevels(df1[1:4, 1:4]))` — akrun, Oct 30 '18 at 00:51
gene_id DO221539 DO221540 DO221541 DO221542 DO221543 DO221544 DO221545 DO221546 DO224575 ENSG00000000003 681 148 650 258 57 224 60 161 15 ENSG00000000003 0 0 0 0 0 0 0 0 0 ENSG00000000003 0 0 0 0 0 0 0 0 0 ENSG00000000003 0 0 0 0 0 0 0 0 0 ENSG00000000003 0 0 0 0 0 0 0 0 0 ENSG00000000003 0 0 0 0 0 0 0 0 949 ENSG00000000003 0 0 0 0 0 0 0 0 0 ENSG00000000003 0 0 0 0 0 0 0 0 0 ENSG00000000003 0 0 0 0 0 0 0 0 0 ENSG00000000003 0 0 0 0 0 0 0 0 0 — Anna, Oct 30 '18 at 00:55
Not in a comment, please post the output from `dput(...)` into your question. — r2evans, Oct 30 '18 at 00:56
Anna, I just posted a suggested edit to your question. That is one way to provide sample data for people trying to help. Comments are horrible for significant data and code ... I think I inferred the columns correctly, please correct me if I got it wrong. — r2evans, Oct 30 '18 at 01:08
I suggest you take the sample data I copied from your comment and please make an R object that is your expected output. That is, with this 10x10 `data.frame`, your expected output could be a 10x1 frame, a vector, a 1x10 frame, or ... something. Using the numbers in that data, construct what you need resulting from this process. — r2evans, Oct 30 '18 at 01:10
Thanks akrun, The code which is library(dplyr); df1 %>% group_by(sample_id) %>% summarise_all(max) worked for me very well — Anna, Oct 30 '18 at 01:32

score 0 · Accepted Answer · answered Oct 30 '18 at 01:32

0

We can group by 'gene_id' and get the max of each column with summarise_all

library(tidyverse)
df1 %>% 
   group_by(gene_id) %>% 
   summarise_all(max)

answered Oct 30 '18 at 01:32

akrun

874,273
37
540
662

How to select highest value of col in r

1 Answers1