3

My actual dataset is composed of repeated measurements for each id, where the number of measurements can vary across individuals. A simplified example is:

dat <- data.frame(id = c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 3L, 3L, 3L))
dat
##    id
## 1   1
## 2   1
## 3   1
## 4   1
## 5   1
## 6   1
## 7   2
## 8   2
## 9   3
## 10  3
## 11  3

I am trying to sequentially number the dat rows by the id variable. The result should be:

dat
##    id s
## 1   1 1
## 2   1 2
## 3   1 3
## 4   1 4
## 5   1 5
## 6   1 6
## 7   2 1
## 8   2 2
## 9   3 1
## 10  3 2
## 11  3 3

How would you do that? I tried to select the last row of each id by using duplicated(), but this is probably not the way, since it works with the entire column.

A5C1D2H2I1M1N2O1R2T1
  • 190,393
  • 28
  • 405
  • 485
Stefano Lombardi
  • 1,581
  • 2
  • 22
  • 48
  • This question seems pretty similar. In fact, I see now that is where I got my answer: http://stackoverflow.com/questions/8209015/observation-number-by-group – Mark Miller Jan 12 '13 at 16:08

3 Answers3

10

Use ave(). The first item is the item you're going to apply the function to; the other items are your grouping variables, and FUN is the function you want to apply. See ?ave for more details.

transform(dat, s = ave(id, id, FUN = seq_along))
#    id s
# 1   1 1
# 2   1 2
# 3   1 3
# 4   1 4
# 5   1 5
# 6   1 6
# 7   2 1
# 8   2 2
# 9   3 1
# 10  3 2
# 11  3 3

If you have a large dataset or are using the data.table package, you can make use of ".N" as follows:

library(data.table)
DT <- data.table(dat)
DT[, s := 1:.N, by = "id"]
## Or
## DT[, s := sequence(.N), id][]

Or, you can use rowid, like this:

library(data.table)
setDT(dat)[, s := rowid(id)][]
#     id s
#  1:  1 1
#  2:  1 2
#  3:  1 3
#  4:  1 4
#  5:  1 5
#  6:  1 6
#  7:  2 1
#  8:  2 2
#  9:  3 1
# 10:  3 2
# 11:  3 3

For completeness, here's the "tidyverse" approach:

library(tidyverse)
dat %>% 
  group_by(id) %>% 
  mutate(s = row_number(id))
## # A tibble: 11 x 2
## # Groups: id [3]
##       id     s
##    <int> <int>
##  1     1     1
##  2     1     2
##  3     1     3
##  4     1     4
##  5     1     5
##  6     1     6
##  7     2     1
##  8     2     2
##  9     3     1
## 10     3     2
## 11     3     3
A5C1D2H2I1M1N2O1R2T1
  • 190,393
  • 28
  • 405
  • 485
3
dat <- read.table(text = "
    id          
    1 
    1 
    1 
    1 
    1 
    1 
    2 
    2 
    3 
    3 
    3", 
header=TRUE)

data.frame(
    id = dat$id,
    s = sequence(rle(dat$id)$lengths) 
)

Gives:

   id s
1   1 1
2   1 2
3   1 3
4   1 4
5   1 5
6   1 6
7   2 1
8   2 2
9   3 1
10  3 2
11  3 3
Mark Miller
  • 12,483
  • 23
  • 78
  • 132
1

using tapply but not elegant as ave

 cbind(dat$id,unlist(tapply(dat$id,dat$id,seq_along)))
  [,1] [,2]
11    1    1
12    1    2
13    1    3
14    1    4
15    1    5
16    1    6
21    2    1
22    2    2
31    3    1
32    3    2
33    3    3
agstudy
  • 119,832
  • 17
  • 199
  • 261
  • If you look at the function for `ave()`, you'll see that it contains [your question from earlier today](http://stackoverflow.com/q/14294052/1270695) ;) – A5C1D2H2I1M1N2O1R2T1 Jan 12 '13 at 15:59
  • @AnandaMahto thanks but I know that. You were faster than me with the ave,I change mine last minute. – agstudy Jan 12 '13 at 18:53