18

Given a dataframe:

df <- structure(list(a = c(1, 1, 1, 2, 2, 2, 3, 3, 4, 4), b = c(34, 
343, 54, 11, 55, 62, 59, -9, 0, -0.5)), row.names = c(NA, -10L
), class = c("tbl_df", "tbl", "data.frame"))

I want to take last N observations / rows from each group:

df %>% 
dplyr::group_by(a) %>% 
dplyr::last(2)

Gives me wrong results.

I want it to be:

a   b
1 343
1  54
2  55
2  62
3  59
3  -9
4   0
4  -0.5

Please advise what is wrong here?

The error I get is:

Error in order(order_by)[[n]] : subscript out of bounds

SteveS
  • 3,789
  • 5
  • 30
  • 64
  • Possible duplicate of [Select first and last row from grouped data](https://stackoverflow.com/questions/31528981/select-first-and-last-row-from-grouped-data) – tjebo Jan 02 '19 at 15:39

5 Answers5

46

As it is a specific question based on dplyr

1) after the group_by, use slice on the row_number()

library(tidyverse)
df %>% 
   group_by(a) %>% 
   slice(tail(row_number(), 2))
# A tibble: 8 x 2
# Groups:   a [4]
#      a      b
#  <dbl>  <dbl>
#1     1  343  
#2     1   54  
#3     2   55  
#4     2   62  
#5     3   59  
#6     3   -9  
#7     4    0  
#8     4   -0.5

2) Or use filter from dplyr

df %>% 
   group_by(a) %>% 
   filter(row_number() >= (n() - 1))

3) or with do and tail

df %>%
    group_by(a) %>% 
    do(tail(., 2))

4) In addition to the tidyverse, methods, we can also use compact data.table

library(data.table)
setDT(df)[df[, .I[tail(seq_len(.N), 2)], a]$V1]

5) Or by from base R

by(df, df$a, FUN = tail, 2)

6) or with aggregate from base R

df[aggregate(c ~ a, transform(df, c = seq_len(nrow(df))), FUN = tail, 2)$c,]

7) or with split from base R

do.call(rbind, lapply(split(df, df$a), tail, 2))
akrun
  • 874,273
  • 37
  • 540
  • 662
5

Update

dplyr 1.1.0 introduced the .by/by argument to slice() family of verbs to allow for one-time grouping operations:

library(dplyr)

df %>% 
  slice_tail(n = 2, by = a)

Note: at this time by is experimental.


dplyr 1.0.0 introduced slice_tail that makes this simple:

library(dplyr)

df %>% 
  group_by(a) %>% 
  slice_tail(n = 2)

Similarly, there is slice_head to get the first n rows.

LMc
  • 12,577
  • 3
  • 31
  • 43
2

A base R option using tapply is to subset the last two rows for every group.

df[unlist(tapply(1:nrow(df), df$a, tail, 2)), ]

#     a      b
#  <dbl>  <dbl>
#1     1  343  
#2     1   54  
#3     2   55  
#4     2   62  
#5     3   59  
#6     3   -9  
#7     4    0  
#8     4   -0.5

Or another option using ave

df[as.logical(with(df, ave(1:nrow(df), a, FUN = function(x) x %in% tail(x, 2)))), ]
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
1

Also a tidyverse possibility:

df %>%
 group_by(a) %>%
 top_n(2, row_number())

      a        b
  <dbl>    <dbl>
1    1.  343.   
2    1.   54.0  
3    2.   55.0  
4    2.   62.0  
5    3.   59.0  
6    3.   -9.00 
7    4.    0.   
8    4.   -0.500

It is taking the top two rows given the row numbers per groups.

tmfmnk
  • 38,881
  • 4
  • 47
  • 67
0

Try tail().In R head function allows you to preview the first n rows, while tail allows you to preview last n rows.

micstr
  • 5,080
  • 8
  • 48
  • 76
Bin
  • 209
  • 1
  • 3
  • 8
  • Note for future readers. As stated by @akrun in his reply, `tail()` function works within the entire table, not working with `group_by()`. As @LMc says, use `slice_tail()` from `dplyr` – RobertoT Feb 13 '23 at 12:50