I need to sequently analyze a dataset while using subresults of the operations before.
As I am known to R I decided to work with this and one of the solution I tried is using an for loop.
The dataset which I loop through has around 8 million rows with 4 columns.
I use a data.table and the variables are of type character eg. "XXXXXXXXX"
I tried to loop through but it takes approx 0,7 second per cycle from which the "<-" operation takes half a second.
Can anybody recommend a better technique. Potentially rcpp, apply or whatever?
Thx for your support,
Holger
'%!in%' <- function(x,y)!('%in%'(x,y))
library('data.table')
dt_loop <- data.table(
paste0("XXXXXXXXXX", 1:80000000),
paste0("YXXXXXXXXX", 1:80000000),
paste0("ZXXXXXXXXX", 1:80000000),
paste0("AXXXXXXXXX", 1:80000000)
)
colnames(dt_loop)[colnames(dt_loop)=="V1"] <- "m"
colnames(dt_loop)[colnames(dt_loop)=="V2"] <- "c"
colnames(dt_loop)[colnames(dt_loop)=="V3"] <- "ma"
colnames(dt_loop)[colnames(dt_loop)=="V4"] <- "unused"
for(i in 1:nrow(dt_loop)){
m <- dt_loop$m[i]
c <- dt_loop$m[i]
if(m %!in% dt_loop$ma[1:i] & c %!in% dt_loop$ma[1:i]){
dt_loop$ma[i] <- m
} else {
if(m %in% dt_loop$ma[1:i]){
dt_loop$ma[i] <- m
} else {
dt_loop$ma[i] <- c
}
}
}