1
library(tidyverse)
df <- tibble(col1 = c("a", "a", "b", "b"),
             col2 = c(2, NA, 10, 8))
#> # A tibble: 4 x 2
#>   col1   col2
#>   <chr> <dbl>
#> 1 a         2
#> 2 a        NA
#> 3 b        10
#> 4 b         8

I've got the data frame above that I'd like to perform the following logic on:

  • Group by col1
  • With this col1 grouping determine the largest col2 value
  • Populate this largest col2 value as the col3 value, for said grouping

What you'd end up with is the data frame below.

#> # A tibble: 4 x 3
#>   col1   col2  col3
#>   <chr> <dbl> <dbl>
#> 1 a         2     2
#> 2 a        NA     2
#> 3 b        10    10
#> 4 b         8    10

My attempt at the code is below, and I understand it doesn't work because my dplyr::pull() isn't written (by me) in a way that it got the grouping logic I intend. How do I get dplyr::pull() to recognize the grouping I intend, or perhaps there's a better approach to solve my problem.

df %>% 
  group_by(col1) %>% 
  mutate(col3 = top_n(., 1, col2) %>% pull(col2))
#> # A tibble: 4 x 3
#> # Groups:   col1 [2]
#>   col1   col2  col3
#>   <chr> <dbl> <dbl>
#> 1 a         2     2
#> 2 a        NA    10
#> 3 b        10     2
#> 4 b         8    10
Display name
  • 4,153
  • 5
  • 27
  • 75
  • 2
    I think you're getting messed up by trying to use `dplyr` functions in places where base ones would suffice. `top_n` returns a data frame—if all you need is the 1 largest value of `col2`, why not just `max(col2)`? `pull` is the same as just `$` or `[[` and also not needed here – camille Oct 28 '19 at 14:09

1 Answers1

2

You're almost close. The function to use is max which pulls the maximum value after removing the NAs

df %>% 
group_by(col1) %>%
 mutate(col3 = max(col2, na.rm = TRUE))

# A tibble: 4 x 3
# Groups:   col1 [2]
#  col1   col2  col3
#  <chr> <dbl> <dbl>
#1 a         2     2
#2 a        NA     2
#3 b        10    10
#4 b         8    10
deepseefan
  • 3,701
  • 3
  • 18
  • 31