I have a data.table object with 5M rows. It may look like this:
csvdata <- data.table(timestamp = c(1:6),
signal.a=c(12, 12, 13, 12, 12, 14),
signal.b=c(7, 7, 7, 7, 8, 8))
timestamp signal.a signal.b
1 12 7
2 12 7
3 13 7
4 12 7
5 12 8
6 14 8
What I am trying to do is to remove every row in the table, which does not register any signal change. So I would like to end up with this: Row 2 is deleted, because neither signal.a nor signal.b changed.
timestamp signal.a signal.b
1 12 7
3 13 7
4 12 7
5 12 8
6 14 8
I have little experience in R, so I tried the usual approach of a for-loop with the intention to mark each row for deletion and later filter out the rows I would like to keep:
for (i in 1:nrow(csvdata)) {
if (i > 1 && csvdata[i]$signal.a == csvdata[i-1]$signal.a &&
csvdata[i]$signal.b == csvdata[i-1]$signal.b) {
csvdata[i]$Drop <- 1
}
}
The code seems to work, but with 5M rows this code takes forever to run (2h and counting). Is there a more efficient solution?