I have a data frame in R consisting of two columns: 'Genes' and 'Expression'. It has duplicate rows for some of the Genes, however these duplicate entries have differing Expression values. I want to condense the duplicate rows so there is just one row per Gene, and that this row has the largest 'absolute' expression value. See below for example:
For this data frame...
df <- data.frame(Gene=c("AKT","MYC","MYC","RAS","RAS","RAS","TP53"),
Expression=c(3,2,6,1,-4,-1,-3))
Gene Expression
1 AKT 3
2 MYC 2
3 MYC 6
4 RAS 1
5 RAS -4
6 RAS -1
7 TP53 -3
I'd like this output..
Gene Expression
1 AKT 3
2 MYC 6
3 RAS -4
4 TP53 -3
I can identify the duplicated genes using
duplicated(df$Gene)
But I'm not sure how to exclude those duplicates of lesser absolute value.
Ps - I'm new at this R malarkey..