I'm working on trying to improve my code by using the tips outlined in this recent blog post https://www.tidyverse.org/blog/2023/04/performant-packages/ . I've managed to replace some of my simpler filter and mutate calls for slightly speedier code. However there is one section where I can't figure out how I would go about doing this, and would love some guidance if at all possible.
From the blog post they mentioned vec_chop, list_chop and vec_rep_each, but I haven't managed to figure out how I would use the indices to do this over a large dataset, let alone a small one
df2 <- df1 %>%
group_by(clnt_label, term1, term2) %>%
filter(row_number() == 1)
dummy data
df1 <- tibble(clnt_label = rep(LETTERS[1:3], each = 10),
term1 = rev(rep(LETTERS[1:3], times = 10)),
term2 = rep(LETTERS[1:3], each = 5, times = 2))
Any thoughts/advice would be appreciated!
EDIT:
Tried out a solution mentioned by Axeman and had an idea of my own. Tested this on my full dataset instead of the dummy set, and found using distinct() to be the fastest approach of the ones I've tested so far.
microbenchmark::microbenchmark(
all_pairs4_old <- all_pairs3 %>%
group_by(clnt_label, term1, term2) %>%
filter(row_number() == 1),
times = 1, unit = "millisecond")
# Unit: milliseconds
# min lq mean median neval
# 31938.37 31938.37 31938.37 31938.37 1
microbenchmark::microbenchmark(
all_pairs4_head <- all_pairs3 %>%
group_by(clnt_label, term1, term2) %>%
slice_head(n = 1),
times = 1, unit = "millisecond")
# Unit: milliseconds
# min lq mean median neval
# 214474.4 214474.4 214474.4 214474.4 1
microbenchmark::microbenchmark(
all_pairs4_slice <- all_pairs3 %>%
group_by(clnt_label, term1, term2) %>%
slice(1),
times = 1, unit = "millisecond")
# Unit: milliseconds
# min lq mean median neval
# 144225.7 144225.7 144225.7 144225.7 1
microbenchmark::microbenchmark(
all_pairs4_distinct <- all_pairs3 %>%
distinct(clnt_label, term1, term2, .keep_all = TRUE),
times = 1, unit = "millisecond")
# Unit: milliseconds
# min lq mean median neval
# 242.9775 242.9775 242.9775 242.9775 1