3

I am working with huge dataset that has thousands of rows and multiple columns. I want to use Dplyr to find maximum values of a group in a column and mutate the result in another column in the same table. Here is an example of my data:

df <- tibble(a = rep(letters[1:3], each = 3), b = seq(0.1,0.9, length.out = 9))

I want to find maximum value in each group of column a. I know I can present maximum value in each group in a table using this code:

df %>% group_by(a) %>% summarise_all(list(~ max(.)))

which produces:

# A tibble: 3 x 2
  a         b
  <chr> <dbl>
1 a       0.2
2 b       0.5
3 c       0.8

But that is not what I want. I desire a table that keeps the raw data, but with an additional column c such that each value in column b will have corresponding value in column c that shows maximum value in each group that each data point in column b belongs. The desired result that I want is below:

# A tibble: 9 x 3
  a         b     c
  <chr> <dbl> <dbl>
1 a       0.1   0.3
2 a       0.2   0.3
3 a       0.3   0.3
4 b       0.4   0.6
5 b       0.5   0.6
6 b       0.6   0.6
7 c       0.7   0.9
8 c       0.8   0.9
9 c       0.9   0.9

I will be glad for help. Thanks.

William
  • 340
  • 7
  • 17

1 Answers1

7

Using mutate rather than summarize should keep all the data.

library(dplyr)

df <- tibble(a = rep(letters[1:3], each = 3), b = seq(0.1,0.9, length.out = 9))

df %>% 
  group_by(a) %>% 
  mutate(group_max = max(b)) %>% 
  ungroup()

#> # A tibble: 9 x 3
#>   a         b group_max
#>   <chr> <dbl>     <dbl>
#> 1 a       0.1       0.3
#> 2 a       0.2       0.3
#> 3 a       0.3       0.3
#> 4 b       0.4       0.6
#> 5 b       0.5       0.6
#> 6 b       0.6       0.6
#> 7 c       0.7       0.9
#> 8 c       0.8       0.9
#> 9 c       0.9       0.9

Created on 2020-01-10 by the reprex package (v0.3.0)

mrhellmann
  • 5,069
  • 11
  • 38