I have a question that I have answer two but think it is inefficient and so want a better way. I am querying a database using DBplyr and joining tables based upon a few ID columns.
For example, I have one table (Table_1) in the database with an org_ID and a sender_ID, the first is the number for the organization in a given geography and the second a number for organizations with which they interacted in a given geography. So org_ID = 1 in geography 2, org_ID 1 in geography 3, and so forth.
I also have a single identification table (Table_2) with an org_ID column and a corresponding geography. There is not a sender_ID in Table_2, but the sender_ID in Table_2 does match org_ID/geography pairs in Table_1.
I want to combine these tables, but I need to use the Table_1 twice, effectively. What works, but is slow is the following
df <- Table_1 %>%
left_join(Table_2, by=c('org_ID' = "org_ID" , 'geography' = 'geography' ))%>%
left_join(Table_2, by=c('sender_ID' = "org_ID" , 'geography' = 'geography')) %>%
#various grouping and summary commands %>%
collect()
Any ideas for better ways?