2

I want to get first and last value for groups using grouping similar to what rle() function does.

For example I have this data frame:

> df
   df time
1   1    A
2   1    B
3   1    C
4   1    D
5   2    E
6   2    F
7   2    G
8   1    H
9   1    I
10  1    J
11  3    K
12  3    L
13  3    M
14  2    N
15  2    O
16  2    P

I want to get something like this:

> want
  df first last
1  1     A    D
2  2     E    G
3  1     H    J
4  3     K    M
5  2     N    P

How you can see, I want to group my values in a way rle() function does. I want to group elements only when this same value is next to each other. group_by groups elements in the different way.

> rle(df$df)
Run Length Encoding
  lengths: int [1:5] 4 3 3 3 3
  values : num [1:5] 1 2 1 3 2

Is there a solution for my problem? Any advice will be appreciated.

Sotos
  • 51,121
  • 6
  • 32
  • 66
Jo.Hen
  • 55
  • 6

2 Answers2

3

There is a function rleid from data.table that does that job, i.e.

library(data.table)

setDT(dt)[, .(df = head(df, 1), 
              first = head(time, 1), 
              last = tail(time, 1)), 
      by = (grp = rleid(df))][, grp := NULL][]

Which gives,

   df first last
1:  1     A    D
2:  2     E    G
3:  1     H    J
4:  3     K    M
5:  2     N    P

Adding a dplyr approach, as @RonakShah mentions

library(dplyr)

df %>% 
 group_by(grp = cumsum(c(0, diff(df)) != 0)) %>% 
 summarise(df = first(df), 
           first = first(time), 
           last = last(time)) %>% 
 select(-grp)

Giving,

# A tibble: 5 x 3
     df first  last
  <int> <chr> <chr>
1     1     A     D
2     2     E     G
3     1     H     J
4     3     K     M
5     2     N     P
Sotos
  • 51,121
  • 6
  • 32
  • 66
  • Thank you very much. This is exactly what I was looking for! – Jo.Hen Aug 04 '17 at 07:20
  • 1
    @RonakShah if you're using `dplyr`, might as well do `first = first(time), last = last(time)`. Nice and clear. – Gregor Thomas Aug 04 '17 at 07:22
  • Second approach gives me an error : Error in overscope_eval_next(overscope, expr) : object 'grp' not found. Do you know why? – Jo.Hen Aug 04 '17 at 07:41
  • It's the `group_by` that gives the problem. Maybe try naming `df` variable to something else as `df` is also the name of the data frame. – Sotos Aug 04 '17 at 07:43
1

Here is an option using base R with rle. Once we do the rle on the first column, replicate the sequence of values with lengths, use that to create logical index with duplicated, then subset the values of the original dataset based on the index

rl <- rle(df[,1])
i1 <- rep(seq_along(rl$values), rl$lengths)
i2 <- !duplicated(i1)
i3 <- !duplicated(i1, fromLast = TRUE)
wanted <- data.frame(df = df[i2,1], first =  df[i2,2], last = df[i3,2])
wanted
#   df first last
#1  1     A    D
#2  2     E    G
#3  1     H    J
#4  3     K    M
#5  2     N    P
akrun
  • 874,273
  • 37
  • 540
  • 662