Suppose I have two tables. One with appointments and second with receptions. Each table has filial ID, medic ID, start and end time (plan for appointments and fact for receptions) and some other data. I want to count how much of appointments have receptions inside time interval of appointment period. Reception fact can begin before appointment start time, after, it can be inside app. interval, etc.
Below I made two tables. One for appointments and one for receptions. I wrote nested loop but it works very slow. My tables contains approximately 50 mio rows each. I need fast solution for this problem. How can I do this without loop? Thanks in advance!
library(data.table)
date <- as.POSIXct('2015-01-01 14:30:00')
# appointments data table
app <- data.table(med.id = 1:10,
filial.id = rep(c(100,200), each = 5),
start.time = rep(seq(date, length.out = 5, by = "hours"),2),
end.time = rep(seq(date+3599, length.out = 5, by = "hours"),2),
A = rnorm(10))
# receptions data table
re <- data.table(med.id = c(1,11,3,4,15,6,7),
filial.id = c(rep(100, 5), 200,200),
start.time = as.POSIXct(paste(rep('2015-01-01 ',7), c('14:25:00', '14:25:00','16:32:00', '17:25:00', '16:10:00', '15:35:00','15:50:00'))),
end.time = as.POSIXct(paste(rep('2015-01-01 ',7), c('15:25:00', '15:20:00','17:36:00', '18:40:00', '16:10:00', '15:49:00','16:12:00'))),
B = rnorm(7))
app$count <- 0
for (i in 1:dim(app)[1]){
for (j in 1:dim(re)[1]){
if ((app$med.id[i] == re$med.id[j]) & # med.id is equal and
app$filial.id[i] == re$filial.id[j]) { # filial.id is equal
if ((re$start.time[j] < app$start.time[i]) & (re$end.time[j] > app$start.time[i])) { # reception starts before appointment start time and ends after appointment start time OR
app$count[i] <- app$count[i] + 1
} else if ((re$start.time[j] < app$end.time[i]) & (re$start.time[j] > app$start.time[i])) { # reception starts before appointment end time and after app. start time
app$count[i] <- app$count[i] + 1
}
}
}
}