Selecting distinct rows in dplyr

Question

dat <- data.frame(loc.id = rep(1:2, each = 3), 
              year = rep(1981:1983, times = 2), 
              prod = c(200,300,400,150,450,350),
              yld = c(1200,1250,1200,3000,3200,3200))

If I want to select for each loc.id distinct values of yld, I do this:

dat %>% group_by(loc.id) %>% distinct(yld)  

    loc.id     yld
    <int>     <dbl>
      1      1200
      1      1250
      2      3000
      2      3200

However, what I want to do is for loc.id, if years have the same yld, then select the yld with a lower prod value. My dataframe should look like i.e. I want the prod and year column too included in the final dataframe

    loc.id    year   prod     yld 
      1        1981   200     1200
      1        1982   300     1250
      2        1981   150     3000
      2        1983   350     3200

Thank you. There was a typo – 89_Simple Jun 14 '18 at 15:46 — 89_Simple, Jun 14 '18 at 15:46

akrun · Accepted Answer · 2018-06-14T15:43:55.367

4

We can do an arrange by 'prod' and then slice the first observation

dat %>% 
    arrange(loc.id, prod) %>% 
    group_by(loc.id, yld) %>%
    slice(1)
# A tibble: 4 x 4
# Groups:   loc.id, yld [4]
#  loc.id  year  prod   yld
#   <int> <int> <dbl> <dbl>
#1      1  1981   200  1200
#2      1  1982   300  1250
#3      2  1981   150  3000
#4      2  1983   350  3200

edited Jun 14 '18 at 15:43

answered Jun 14 '18 at 15:42

akrun

874,273
37
540
662

2

Or `dat %>% group_by(loc.id, yld) %>% slice(which.min(prod))`, as in the dupe target – Axeman Jun 15 '18 at 08:07

Selecting distinct rows in dplyr

1 Answers1