I'm trying to fill in a large Matrix with 217Mio elements. I tested my method with a small sample and it did what i wanted. However using the (almost) full sized data it takes for ever. My machine is calculating since almost 48 hours now. I'm looking for help to speed things up.
First dataframe contains origins (population) and the second destinations(supply location). Each origin and destination contains a ID indicating in which zone (census track) they are in. The matrix contains traveltimes for all possible combinations for travel from one zone to another.
first dataframe containing origins sptl_pop19_intrsct_zns
pop_index | ID |
---|---|
P_000001 | 1000101 |
P_000002 | 1000101 |
... | ... |
second dataframe containing destinations sptl_sup_intrsct_zns
sup_index | ID |
---|---|
S_0001 | 2000101 |
S_0002 | 2000101 |
... | ... |
these two dateframe contain more than just these two variables that aren't relevant for now.
Matrix containing traveltimes (mins) for all zone combinations NPVM_mtrx
1000101 | ... | 2000101 | ... | |
---|---|---|---|---|
1000101 | 10 | ... | 60 | ... |
... | ... | ... | ... | ... |
2000101 | 60 | ... | 14 | ... |
my goal is to populate a matrix in this form demsup_mtrx
P_000001 | P_000002 | ... | |
---|---|---|---|
S_0001 | 60 | 60 | ... |
S_0002 | 60 | 60 | ... |
... | ... | ... | ... |
I generated this matrix already but completely empty
For now I'm using two for loops to iterate through the empty matrix looking up the zone IDs for each origin-supply-pair.
for (r in rownames(demsup_mtrx)){
# Get Zone ID from Supply Origin
Sup_znID <- filter(sptl_sup_intrsct_zns, sup_index == r )%>%
pull(ID)
for (c in colnames(demsup_mtrx)){
# Get Zone ID from Demand Origin
Pop_znID <- filter(sptl_pop19_intrsct_zns, pop_index == c )%>%
pull(ID)
# Get travel duration for Pop-Sup pair
Pop_Sup_duration <- NPVM_mtrx[as.character(Pop_znID),as.character(Sup_znID)]
# Fill in Matrix with Traveltime
demsup_mtrx[r,c] = Pop_Sup_duration
}}
For now I'm suspecting that the dplyr filter functions are the bottleneck but im not sure.