I have a simple dataset with an id variable and date variable, and would like to create a counter variable (counter) that increments whenever date changes within the id variable. Assume the data is sorted by id and date, and that a specific date may appear any number of times within an id. This is very easily done in other languages (SAS with retain or Stata with by: and _n/_N), but I haven't found a very efficient way in R.
Asked
Active
Viewed 1,273 times
1
-
1Do not post your data as an image, please learn how to give a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610) – Jaap Dec 08 '15 at 19:27
-
isnt this just `as.numeric(factor(df1$date, unique(df1$date)))` by id? – rawr Dec 08 '15 at 20:03
2 Answers
3
We can try
library(dplyr)
df1 %>%
group_by(id) %>%
mutate(counter= cumsum(c(TRUE, date[-1]!=date[-n()])))
# id date counter
# (dbl) (chr) (int)
#1 1 a 1
#2 1 a 1
#3 1 b 2
#4 1 b 2
#5 2 a 1
#6 2 a 1
#7 2 b 2
data
df1 <- data.frame(id= rep(c(1,2), c(4,3)), date= c('a', 'a',
'b', 'b', 'a', 'a', 'b'), stringsAsFactors=FALSE)

akrun
- 874,273
- 37
- 540
- 662
1
You could also use data.table
and its rleid
-function for this:
library(data.table)
dat <- data.table(id=rep(c(1,2),c(4,3)),
date=c('a','a','b','b','a','a','b'))
dat[,counter:=rleid(date),by=id]
dat
> dat
id date counter
1: 1 a 1
2: 1 a 1
3: 1 b 2
4: 1 b 2
5: 2 a 1
6: 2 a 1
7: 2 b 2

Heroka
- 12,889
- 1
- 28
- 38