Select last row within each group with dplyr is slow

Question

I have the following R code. Essentially, I am asking R to arrange the dataset based on postcode and paon, then group them by id, and finally keep only the last row within each group. However, R requires more than 3 hours to do this.

I am not sure what I am doing wrong with my code since there is no for loop here.

epc2 is a vector with 324,368 rows.

epc3 <- epc2 %>%
  arrange(postcode, paon) %>% 
  group_by(id) %>% 
  do(tail(., 1))

Thank you for any and all of your help.

`do` is slow. Try other alternatives: [Select first and last row from grouped data](https://stackoverflow.com/questions/31528981/select-first-and-last-row-from-grouped-data), [How to select the first and last row within a grouping variable in a data frame?](https://stackoverflow.com/questions/8203818/how-to-select-the-first-and-last-row-within-a-grouping-variable-in-a-data-frame), — Henrik, Feb 15 '19 at 06:59
[last by group for all columns data.table](https://stackoverflow.com/questions/14143220/last-by-group-for-all-columns-data-table) — Henrik, Feb 15 '19 at 07:04
The data.table approach is likely going to be the fastest, but it should already be faster if you replace your last line by `summarize_all(last)` — meriops, Feb 15 '19 at 08:16
Thank you for your replies. summarize_all(last) did the trick for me — D M, Feb 15 '19 at 08:25
A similar case is described in [Select the first row by group](https://stackoverflow.com/questions/13279582/select-the-first-row-by-group/50955051#50955051). I recommend dplyr::group_by, dplyr::filter combined with dplyr::row_number to solve issues like this — Kresten, Feb 15 '19 at 10:13

score 1 · Accepted Answer · answered Feb 15 '19 at 17:38

1

How about:

mtcars %>% 
  arrange(cyl) %>% 
  group_by(cyl) %>% 
  slice(n())

answered Feb 15 '19 at 17:38

davsjob

1,882
15
10

Select last row within each group with dplyr is slow

1 Answers1