I am fairly new to the art of programming (loops etc..) and this is something where I would be grateful if I could get an opinion whether my approach is fine or it would definitely need to be optimized if it was about to used on much bigger sample.
Currently I have approximately 20 000 observations and one of the columns is the ID of receipt. What I would like to achieve is to assign each row to a group that would consist of IDs that are ascending in a format of n+1. If this rule is broken the new group should be created until the rule is broken again.
To illustrate, lets say I have this table (Important note is that ID are not necessarily unique and can repeat, like ID 10 in my example):
MyTable <- data.frame(ID = c(1,2,3,4,6,7,8,10,10,11,17,18,19,200,201,202,2010,2011,2013))
MyTable
ID
1
2
3
4
6
7
8
10
10
11
17
18
19
200
201
202
2010
2011
2013
The result of my grouping should be following:
ID GROUP
1 1
2 1
3 1
4 1
6 2
7 2
8 2
10 3
10 3
11 3
17 4
18 4
19 4
200 5
201 5
202 5
2010 6
2011 6
2013 7
I used dplyr for ordering the ID in ascending way. Then created the variable MyData$Group which I have simply filled with 1's.
rep(1,length(MyTable$ID)
for (i in 2:length(MyTable$ID) ) {
if(MyTable$ID[i] == MyTable$ID[i-1]+1 | MyTable$ID[i] == MyTable$ID[i-1]) {
MyTable$ID[i] <- MyTable$GROUP[i-1]
} else {
MyTable$GROUP[i] <- MyTable$GROUP[i-1]+1
}
}
This code worked for me and I got the results fairly easily. However, I wonder if in eyes of more experienced programmers, this piece of code would be considered as "bad", "average", "good" or whatever rating you come up with.
EDIT: I am sure this topic has been touched already, not arguing against that. Though, as the main difference is that I would like to touch a topic of optimization here and see whether my approach meets standards.
Thanks!