I am looking for a way to add a column (almost like a sequence column) to a data set that indicates every change of a specific column. I found this very good solution here: Increment by 1 for every change in column in R and it worked perfectly for most of the observations.
My data set has 18 columns and about 320'000 rows. To make it easier it looks like the following (including the result):
df <- data.frame(var1= c(1, 0, 1, 0, 0, 1, 0, 0, 0), sequence=c(1, 2, 3, 4, 4, 5, 6, 6, 6))
I used the following piece of code and it worked well for my example above:
df$seq <- cumsum(c(1,as.numeric(diff(df$var1))!=0))
However, I recognized that sometimes my new column (the seq column) changes its value, even though the other column (var1) does not!
Is there anything wrong with the cumsum(c(1,as.numeric(diff(df$var1))!=0))
command or has it to do with problems in my data?
As I am quiet new to R, I would be grateful if somebody could help me with this.