I have 2 tibble data frames that I am trying to reconcile. The first tibble has over a million observations, the first few rows are as follows:
data
ID Time(Converted to number)
1 23160
1 23161
1 23162
1 23163
1 23164
1 23165
2 24251
2 24252
The second tibble is a lookup table (that has information of a particular event that has occurred), simplified version as follows:
lookup_table
ID Event_Time Event_Indicator Number_of_Cumulative_Events
1 23162 1 1
1 23164 1 2
2 24255 1 1
2 24280 0 1
I would like to create a 3rd column in the first tibble, such that it shows the number of cumulative events at that time of the observation. The 3rd column in the above example would therefore be:
ID Time(Converted to number) Number
1 23160 0
1 23161 0
1 23162 1
1 23163 1
1 23164 2
1 23165 2
2 24251 0
2 24252 0
I am trying to avoid having to loop through the millions of observations to compare each observation's time to the Event_Time in the lookup table because of computation time.
However, I am not sure how to go about doing this without the use of a loop. The issue is that the lookup_table contains some IDs multiple times, if all IDs only appeared in the lookup_table only once, then I could do:
data$Event_Time <- lookup_table[match(data$ID, lookup_table$ID),"Event_Time"]
data$Number <- data %>% mutate(ifelse(Time >= Event_Time,1,0))
Any ideas how I could avoid the use of a loop and yet apply the lookup conditions for each observation? Thank you.
Edit: I am not trying to join the tables, but more of comparing the time columns in the lookup_table and data table to obtain my desired column. Example, if I were to write an inefficient loop function, it would be:
for (i in 1:nrow(data)) {
data$Number[i] <- subset(lookup_table,ID == data$ID[i])[max(which
(data$Time[i] >= lookup_table$Event_Time)), "Number_of_Cumulative_Events"]
}