Here is a way with slice_max
. I have defined two variables, ties_ok
and max_n
. The latter is set to 3 to test the code, you want max_n <- 110
, the former can bee set to FALSE
if you want to discard ties and keep only the first rows found.
df1 <- " gene tail_len
1 SPAC20G4.06c 3
2 SPCC613.06 5
3 SPAC6F6.03c 2
4 SPAC20G4.06c 3
5 SPBC23G7.15c 5
6 SPAC589.10c 2
7 SPBC23G7.15c 3
8 SPAC22H12.04c 1
9 SPAC22H12.04c 12
10 SPAC6G10.11c 8
11 SPAC589.10c 31
12 SPBC18E5.06 16"
df1 <- read.table(text = df1, header = TRUE)
suppressPackageStartupMessages(
library(dplyr)
)
ties_ok <- TRUE
#ties_ok <- FALSE
max_n <- 3L
df1 %>%
group_by(gene) %>%
summarise(count = n(), mean_tail_len = mean(tail_len)) %>%
slice_max(count, n = max_n, with_ties = ties_ok) %>%
select(-count)
#> # A tibble: 4 × 2
#> gene mean_tail_len
#> <chr> <dbl>
#> 1 SPAC20G4.06c 3
#> 2 SPAC22H12.04c 6.5
#> 3 SPAC589.10c 16.5
#> 4 SPBC23G7.15c 4
Created on 2023-01-20 with reprex v2.0.2