Summarise all using which on other column in dplyr

Question

For some reason, I could not find a solution using the summarise_all function for the following problem:

df <- data.frame(A = c(1,2,2,3,3,3,4,4), B = 1:8, C = 8:1, D = c(1,2,3,1,2,5,10,9))

desired results:

df %>% 
    group_by(A) %>% 
    summarise(B = B[which.min(D)],
              C = C[which.min(D)],
              D = D[which.min(D)])
# A tibble: 4 x 4
      A     B     C     D
  <dbl> <int> <int> <dbl>
1     1     1     8     1
2     2     2     7     2
3     3     4     5     1
4     4     8     1     9

What I tried:

df %>% 
    group_by(A) %>% 
    summarise_all(.[which.min(D)])

In words, I want to group by a variable and find for each column the value that belongs to the minimum value of another column. I could not find a solution for this using summarise_all. I am searching for a dplyr approach.

eipi10 · Accepted Answer · 2017-09-22T16:22:58.190

You can just filter down to the row that has a minimum value of D for each level of A. The code below assumes there is only one minimum row in each group.

df %>% 
  group_by(A) %>% 
  arrange(D) %>% 
  slice(1)

      A     B     C     D
1     1     1     8     1
2     2     2     7     2
3     3     4     5     1
4     4     8     1     9

If there can be multiple rows with minimum D, then:

df <- data.frame(A = c(1,2,2,3,3,3,4,4), B = 1:8, C = 8:1, D = c(1,2,3,1,2,5,9,9))

df %>% 
  group_by(A) %>% 
  filter(D == min(D))

      A     B     C     D
1     1     1     8     1
2     2     2     7     2
3     3     4     5     1
4     4     7     2     9
5     4     8     1     9

Or `df %>% arrange(D) %>% distinct(A, .keep_all = TRUE)` .. I think. — Frank, Sep 22 '17 at 16:22

score 1 · Answer 2 · answered Sep 22 '17 at 16:24

You need filter - any time you're trying to drop some rows and keep others, that's the verb you want.

df %>% group_by(A) %>% filter(D == min(D))
#> # A tibble: 4 x 4
#> # Groups:   A [4]
#>       A     B     C     D
#>   <dbl> <int> <int> <dbl>
#> 1     1     1     8     1
#> 2     2     2     7     2
#> 3     3     4     5     1
#> 4     4     8     1     9

Summarise all using which on other column in dplyr

2 Answers2