1

I have single observations for some and double observations for others. In the case of double observations in x, I want to choose the row with the lower y value to remove that row.

ex <- data.frame('x'= c(1:5, 1:3,5:6), 'y'= c(70,73,72,49,60,14,50,46,13,29))

Original df: enter image description here

Desired result: enter image description here

W Hampton
  • 53
  • 7
  • 1
    `ex %>% group_by(x) %>% slice(which.max(y))` in `dplyr` – Ronak Shah Apr 29 '20 at 07:34
  • Thank you, but this seems to remove any other columns. What if I have another column z that I want to keep in this process? – W Hampton Apr 29 '20 at 07:45
  • 1
    No, it does not. Did you try it? `ex <- data.frame('x'= c(1:5, 1:3,5:6), 'y'= c(70,73,72,49,60,14,50,46,13,29), z = 1:10)` and `ex %>% group_by(x) %>% slice(which.max(y))` – Ronak Shah Apr 29 '20 at 07:49
  • You are right. I get a warning about implicit NAs but seems to have worked. Thank you! – W Hampton Apr 29 '20 at 07:56

1 Answers1

1

You can use aggregate to find the largest number per group:

aggregate(y ~ x, ex, max)
#  x  y
#1 1 70
#2 2 73
#3 3 72
#4 4 49
#5 5 60
#6 6 29
GKi
  • 37,245
  • 2
  • 26
  • 48