Find duplicates in column x, then remove row that has lower value in column y

Question

I have single observations for some and double observations for others. In the case of double observations in x, I want to choose the row with the lower y value to remove that row.

ex <- data.frame('x'= c(1:5, 1:3,5:6), 'y'= c(70,73,72,49,60,14,50,46,13,29))

Original df:

Desired result:

Thank you, but this seems to remove any other columns. What if I have another column z that I want to keep in this process? — W Hampton, Apr 29 '20 at 07:45
No, it does not. Did you try it? `ex <- data.frame('x'= c(1:5, 1:3,5:6), 'y'= c(70,73,72,49,60,14,50,46,13,29), z = 1:10)` and `ex %>% group_by(x) %>% slice(which.max(y))` — Ronak Shah, Apr 29 '20 at 07:49
You are right. I get a warning about implicit NAs but seems to have worked. Thank you! — W Hampton, Apr 29 '20 at 07:56

score 1 · Accepted Answer · answered Apr 29 '20 at 07:33

1

You can use aggregate to find the largest number per group:

aggregate(y ~ x, ex, max)
#  x  y
#1 1 70
#2 2 73
#3 3 72
#4 4 49
#5 5 60
#6 6 29

answered Apr 29 '20 at 07:33

GKi

37,245
2
26
48

Find duplicates in column x, then remove row that has lower value in column y

1 Answers1