I am attempting to append a sequence number to a data frame grouped by individuals and date. For example, to turn this:
x y
1 A 2012-01-02
2 A 2012-02-03
3 A 2012-02-25
4 A 2012-03-04
5 B 2012-01-02
6 B 2012-02-03
7 C 2013-01-02
8 C 2012-02-03
9 C 2012-03-04
10 C 2012-04-05
in to this:
x y v
1 A 2012-01-02 1
2 A 2012-02-03 2
3 A 2012-02-25 3
4 A 2012-03-04 4
5 B 2012-01-02 1
6 B 2012-02-03 2
7 C 2013-01-02 1
8 C 2012-02-03 2
9 C 2012-03-04 3
10 C 2012-04-05 4
where "x" is the individual, "y" is the date, and "v" is the appended sequence number
I have had success on a small data frame using a for loop in this code:
x=c("A","A","A","A","B","B","C","C","C","C")
y=as.Date(c("1/2/2012","2/3/2012","2/25/2012","3/4/2012","1/2/2012","2/3/2012",
"1/2/2013","2/3/2012","3/4/2012","4/5/2012"),"%m/%d/%Y")
x
y
z=data.frame(x,y)
z$v=rep(1,nrow(z))
for(i in 2:nrow(z)){
if(z$x[i]==z$x[i-1]){
z$v[i]=(z$v[i-1]+1)
} else {
z$v[i]=1
}
}
but when I expand this to a much larger data frame (250K+ rows) the process takes forever.
Any thoughts on how I can make this more efficient?