I made up a test data frame like this:
gene <- as.factor(c('A','B','B','B','C','C','D'))
location <- as.integer(c(1,4,5,6,2,3,9))
df <- data.frame(gene, location)
> df
gene location
1 A 1
2 B 4
3 B 5
4 B 6
5 C 2
6 C 3
7 D 9
I would like to keep unique genes A, B, C, D, and filter out duplicated genes with non-highest location. (e.g. for gene B, only B with location 6 would be kept; for gene C, only C with location 3 would be kept).
So the end result should be like:
gene location
1 A 1
4 B 6
6 C 3
7 D 9
Does anyone know how can I do this?