R create ID within a group

Question

I have the following dataset:

df<-structure(list(IDFAM = c("2010 7599 2996 1", "2010 7599 3071 1", 
"2010 7599 3071 1", "2010 7599 3660 1", "2010 7599 4736 1", "2010 7599 6235 1", 
"2010 7599 6299 1", "2010 7599 9903 1", "2010 7599 11013 1", 
"2010 7599 11778 1", "2010 7599 11778 1", "2010 7599 12248 1", 
"2010 7599 13127 1", "2010 7599 14261 1", "2010 7599 16280 1", 
"2010 7599 16280 1", "2010 7599 16280 1", "2010 7599 16280 1", 
"2010 7599 16280 1", "2010 7599 17382 1"), AGED = c(45L, 47L, 
24L, 46L, 46L, 44L, 43L, 43L, 43L, 16L, 43L, 46L, 44L, 47L, 43L, 
16L, 20L, 18L, 18L, 43L)), .Names = c("IDFAM", "AGED"), row.names = c("5614", 
"5748", "5753", "6864", "8894", "11761", "11884", "18738", "20896", 
"22351", "22353", "23267", "24939", "27072", "30946", "30947", 
"30949", "30950", "30952", "33034"), class = "data.frame")

I would like to assign an ID to each observation having the same IDFAM value ranging from 1 to n, where n is the number of observations with the same value of IDFAM. This would result in the following table:

IDFAM              AGED     ID
2010 7599 2996 1    45       1
2010 7599 3071 1    47       1
2010 7599 3071 1    24       2
2010 7599 3660 1    46       1
2010 7599 4736 1    46       1
2010 7599 6235 1    44       1
2010 7599 6299 1    43       1
2010 7599 9903 1    43       1
2010 7599 11013 1   43       1
2010 7599 11778 1   16       1
2010 7599 11778 1   43       2
2010 7599 12248 1   46       1
2010 7599 13127 1   44       1
2010 7599 14261 1   47       1
2010 7599 16280 1   43       1
2010 7599 16280 1   16       2
2010 7599 16280 1   20       3
2010 7599 16280 1   18       4
2010 7599 16280 1   18       5
2010 7599 17382 1   43       1

How can I do this ? Thanks.

score 24 · Accepted Answer · edited Jan 24 '17 at 18:19

24

There are several ways.

In base R, use ave:

with(df, ave(rep(1, nrow(df)), IDFAM, FUN = seq_along))
#  [1] 1 1 2 1 1 1 1 1 1 1 2 1 1 1 1 2 3 4 5 1

With the "data.table" package, use sequence(.N):

library(data.table)
DT <- as.data.table(df)
DT[, ID := sequence(.N), by = IDFAM]

With the "dplyr" package, try:

df %>% group_by(IDFAM) %>% mutate(count = sequence(n()))

or (as recommended by Hadley in the comments):

df %>% group_by(IDFAM) %>% mutate(count = row_number(IDFAM))

Update

Since this seems to be something that is asked for relatively frequently, this feature has been added as a function (getanID) in my "splitstackshape" package. It is based on the "data.table" approach above.

library(splitstackshape)
getanID(df, id.vars = "IDFAM")
#                 IDFAM AGED .id
#  1:  2010 7599 2996 1   45   1
#  2:  2010 7599 3071 1   47   1
#  3:  2010 7599 3071 1   24   2
#  4:  2010 7599 3660 1   46   1
#  5:  2010 7599 4736 1   46   1
#  6:  2010 7599 6235 1   44   1
#  7:  2010 7599 6299 1   43   1
#  8:  2010 7599 9903 1   43   1
#  9: 2010 7599 11013 1   43   1
# 10: 2010 7599 11778 1   16   1
# 11: 2010 7599 11778 1   43   2
# 12: 2010 7599 12248 1   46   1
# 13: 2010 7599 13127 1   44   1
# 14: 2010 7599 14261 1   47   1
# 15: 2010 7599 16280 1   43   1
# 16: 2010 7599 16280 1   16   2
# 17: 2010 7599 16280 1   20   3
# 18: 2010 7599 16280 1   18   4
# 19: 2010 7599 16280 1   18   5
# 20: 2010 7599 17382 1   43   1

edited Jan 24 '17 at 18:19

Gregor Thomas

136,190
20
167
294

answered Apr 21 '14 at 12:42

A5C1D2H2I1M1N2O1R2T1

190,393
28
405
485

Thanks, I particularly like the `with`/`ave` method. Simple and efficient. – user2568648 Apr 21 '14 at 12:49
2

Better to use `row_number()` in dplyr. – hadley Apr 21 '14 at 13:00
@AnandaMahto. I must say. All this is very nice! – Paulo E. Cardoso Apr 21 '14 at 13:01
@hadley, not sure if I used `row_number()` correctly in my edit here. Is that what you had in mind? – A5C1D2H2I1M1N2O1R2T1 Apr 21 '14 at 16:19
1

Yes, that's right. In dplyr 0.2, you'll be able to use it bare, `count = row_number()`. – hadley Apr 21 '14 at 17:45
There's nothing wrong with using sequence, but using `row_number()` will also work with SQL backends, and knowing the common [window functions](http://cran.r-project.org/web/packages/dplyr/vignettes/window-functions.html) allows you to solve a wide class of problems. – hadley Apr 21 '14 at 17:46
What's the difference between using seq_len(.N) or sequence(.N) ? – skan Nov 03 '16 at 14:40

score 4 · Answer 2 · answered Jan 24 '17 at 18:03

4

With dplyr 0.5 you can use the group_indices function. Although it do not support mutate, the following approach is straightforward:

df$id <- df %>% group_indices(IDFAM)

answered Jan 24 '17 at 18:03

Rodrigo Remedio

640
6
20

R create ID within a group

2 Answers2

Update

Linked

Related