-1

I have a tibble with name, region, size and num

I need to get the top 1 names in each region by the sum of num

data <- tribble(
      ~name, ~region, ~size, ~num,
      "joe", "east", "small", 10,
      "moe", "east", "small", 20,
      "doe", "east", "small", 30,
      "joe", "west", "small", 30,
      "moe", "west", "small", 20,
      "doe", "west", "small", 10
    )

    result <- data %>% 
      group_by(name, region) %>% 
      summarize(total = sum(num)) %>%
      top_n(1)

    result

This gives the top 1 for each name/region pair (6 rows - too many), but I need the top 1 names for each region (east, doe, 30 and west, joe, 30). What do I need to add?

Sotos
  • 51,121
  • 6
  • 32
  • 66
Old Man
  • 3,295
  • 5
  • 23
  • 22

2 Answers2

1

You need to add another group_by to capture the region groups, i.e.

data %>%
   group_by(name, region) %>%
   summarize(total = sum(num)) %>%
   group_by(region) %>%
   top_n(1)

#Selecting by total
# A tibble: 2 x 3
# Groups:   region [2]
#  name  region total
#  <chr> <chr>  <dbl>
#1 doe   east      30
#2 joe   west      30
Sotos
  • 51,121
  • 6
  • 32
  • 66
0

If you want the result to be ordered according to region THEN name, you may want to use:

group_by(region, name)