6

I have some data similar in structure to:

a <- data.frame("ID" = c("A", "A", "B", "B", "C", "C"),
                "NUM" = c(1, 2, 4, 3, 6, 9),
                "VAL" = c(1, 0, 1, 0, 1, 0))

And I am trying to sort it by ID and NUM then get the last row. This code works to get the last row and summarize down to a unique ID, however, it doesn't actually get the full last row like I want.

a <- a %>% arrange(ID, NUM) %>%
  group_by(ID) %>%
  summarise(max(NUM))

I understand why this code doesn't work but am looking for the dplyr way of getting the last row for each unique ID

Expected Results:

  ID        NUM     VAL
  <fct    <dbl>    <dbl>
1 A           2       0
2 B           4       1
3 C           9       0

Note: I will admit that though it is nearly a duplicate of Select first and last row from grouped data, the answers on that thread were not quite what I was looking for.

Bear
  • 662
  • 1
  • 5
  • 20

4 Answers4

16

One dplyr option could be:

a %>%
 arrange(ID, NUM) %>% 
 group_by(ID) %>% 
 summarise_all(last)

  ID      NUM   VAL
  <fct> <dbl> <dbl>
1 A        2.    0.
2 B        4.    1.
3 C        9.    0.

Or since dplyr 1.0.0:

a %>%
 arrange(ID, NUM) %>% 
 group_by(ID) %>% 
 summarise(across(everything(), last))

Or using slice_max():

a %>%
    group_by(ID) %>%
    slice_max(order_by = NUM, n = 1)
tmfmnk
  • 38,881
  • 4
  • 47
  • 67
15

You might try:

a %>% 
  group_by(ID) %>% 
  arrange(NUM) %>%  
  slice(n())
Bear
  • 662
  • 1
  • 5
  • 20
joran
  • 169,992
  • 32
  • 429
  • 468
4

tail() returns the last 6 items of a subsettable object. When using aggregate(), the parameters to the FUN argument are passed immediately after the function using a comma; here 1 refers to n = 1, which tells tail() to only return the last item.

aggregate(a[, c('NUM', 'VAL')], list(a$ID), tail, 1)

# Group.1 NUM VAL
# 1       A   2   0
# 2       B   3   0
# 3       C   9   0
12b345b6b78
  • 995
  • 5
  • 16
  • 2
    Would be helpful to wrap this in some explanation of what this does, and why it solves the OPs problem. – SiKing Dec 04 '18 at 20:41
0

You can use top_n. (grouping already sorts by ID, and sorting by NUM isn't necessary since there's only 1 value)

library(dplyr)

a %>% 
  group_by(ID) %>% 
  top_n(1, NUM)

# # A tibble: 3 x 3
# # Groups:   ID [3]
#   ID      NUM   VAL
#   <fct> <dbl> <dbl>
# 1 A         2     0
# 2 B         4     1
# 3 C         9     0
IceCreamToucan
  • 28,083
  • 2
  • 22
  • 38