How to find the first and last occurrence in a panel data set in R

Question

I have a table:

And I want to convert it to:

id  first last
1      1     5
2      3     7
3      8    14

Please help!

Possible duplicate of [Extract rows for the first occurrence of a variable in a data frame](http://stackoverflow.com/questions/19944334/extract-rows-for-the-first-occurrence-of-a-variable-in-a-data-frame) — JazzCat, Apr 03 '16 at 17:16

akrun · Answer 1 · 2016-04-03T17:44:44.227

We can use data.table. Convert the 'data.frame' to 'data.table' (setDT(df1)), grouped by 'id', we get the first and last value of 'time'

library(data.table)
setDT(df1)[, list(firstocc = time[1L], lastocc = time[.N]),
                    by = id]

Or with dplyr, we use the same methodology.

library(dplyr)
df1 %>% 
    group_by(id) %>%
    summarise(firstocc = first(time), lastocc = last(time))

Or with base R (no packages needed)

do.call(rbind, lapply(split(df1, df1$id), 
    function(x) data.frame(id = x$id[1],
       firstocc = x$time[1], lastocc = x$time[nrow(x)])))

If we need to be based on the min and max values (not related to the expected output) , the data.table option is

setDT(df1)[, setNames(as.list(range(time)),
                 c('firstOcc', 'lastOcc')) ,id]

and dplyr is

df1 %>%
   group_by(id) %>%
   summarise(firstocc = min(time), lastocc = max(time))

When you group by id I don't want to order id. I want to let it be in the order it is in. Can you help me with that? — Preet Rajdeo, Apr 03 '16 at 18:42
@PreetRajdeo You can convert the 'id' to `factor` class and set the levels iin the same order. i.e. `df1 %>% group_by(id = factor(id, levels = unique(id))) %>% summarise(firstocc = min(time), lastocc = max(time))` — akrun, Apr 04 '16 at 03:11

G. Grothendieck · Answer 2 · 2016-04-03T18:14:39.303

There are many packages that can perform aggregation of this sort in R. We show how to do it without any packages and then show it with some packages.

1) Use aggregate. No packages needed.

ag <- aggregate(time ~ id, DF, function(x) c(first = min(x), last = max(x)))

giving:

> ag
  id time.first time.last
1  1          1         5
2  2          2         7
3  3          3        14

ag is a two column data frame whose second column contains a two column matrix with columns named 'first' and 'last'. If you want to flatten it to a 3 column data frame use:

do.call("cbind", ag)

giving:

     id first last
[1,]  1     1    5
[2,]  2     2    7
[3,]  3     3   14

1a) This variation of (1) is more compact at the expense of uglier column names.

aggregate(time ~ id, DF, range)

2) sqldf

library(sqldf)
sqldf("select id, min(time) first, max(time) last from DF group by id")

giving:

     id first last
[1,]  1     1    5
[2,]  2     2    7
[3,]  3     3   14

3) summaryBy summaryBy in the doBy package is much like aggregate:

library(doBy)

summaryBy(time ~ id, data = DF, FUN = c(min, max))

giving:

  id time.min time.max
1  1        1        5
2  2        2        7
3  3        3       14

Note: Here is the input DF in reproducible form:

Lines <- "id  time
1   1
1   2
1   5
2   3
2   2
2   7
3   8
3   3
3   14"
DF <- read.table(text = Lines, header = TRUE)

Update: Added (1a), (2) and (3) and fixed (1).

score 1 · Answer 3 · answered Apr 03 '16 at 17:31

You can remove duplicates and reshape it

dd <- read.table(header = TRUE, text = "id  time
1   1
1   2
1   5
2   3
2   2
2   7
3   8
3   3
3   14")

d2 <- dd[!(duplicated(dd$id) & duplicated(dd$id, fromLast = TRUE)), ]
reshape(within(d2, tt <- c('first', 'last')), dir = 'wide', timevar = 'tt')

#   id time.first time.last
# 1  1          1         5
# 4  2          3         7
# 7  3          8        14

How to find the first and last occurrence in a panel data set in R

3 Answers3