Note: I have working code, just wondering if there are better methods to implement. My goal to to try to minimize run time if possible. Code is in R currently, would swapping to Python or C++ help speed things up in this case?
I have two tables: one is a table of payments and the other a table of contracts. Each table, at minimum, shares: account numbers, company codes, start dates and end dates (payment period dates in the payment table and contract period dates in the contract table). My objective is to create a third table of payments that cannot be matched/allocated to a contract. A payment matches/allocates to a contract if all three of the following conditions are true:
- Both payment and contract have matching account number
- Both payment and contract have matching company codes
- Payment start date falls within contract start and end dates
Payment Table
ID | AccountNumber | CompanyCode | StartDate | EndDate |
---|---|---|---|---|
1 | 123 | 35 | 1Jan22 | 1Feb22 |
2 | 123 | 37 | 1Mar19 | 1May19 |
3 | 123 | 35 | 1Apr23 | 1Jun23 |
4 | 123 | 35 | 1Mar22 | 1Apr22 |
Contract Table
ID | Account Number | Company Code | StartDate | EndDate |
---|---|---|---|---|
1 | 123 | 35 | 1Dec21 | 1Dec22 |
2 | 123 | 37 | 1Dec18 | 1Dec19 |
Sample data for the purpose of finding answers:
# Create Payment Table
PaymentTable <- data.frame(
ID = c(1, 2, 3, 4),
AccountNumber = c(123, 123, 123, 123),
CompanyCode = c(35, 37, 35, 35),
StartDate = dmy(c("1Jan22", "1Mar19", "1Apr23", "1Mar22")),
EndDate = dmy(c("1Feb22", "1May19", "1Jun23", "1Apr22"))
)
# Create Contract Table
ContractTable <- data.frame(
ID = c(1, 2),
AccountNumber = c(123, 123),
CompanyCode = c(35, 37),
StartDate = dmy(c("1Dec21", "1Dec18")),
EndDate = dmy(c("1Dec22", "1Dec19"))
)
In the above example data set, Payment IDs 1 and 4 match to Contract ID 1. Payment ID 3 matches to Contract ID 2. Payment ID 3 does not match to any contract.
So the conditions are:
- Check if for any ID of the payment dates are in between contract ID's dates
- Check if IDs, AccountNumber, and CompanyCode and are matching in both the tables
When above two conditions are met, merge the table.
I have simple working R code via what I call the naïve method of nested for loops. The outer for loop iterating thru every line of payment and the inner for loop iterating thru every contract. It takes about 10ish minutes for the code to complete 1/5 of all the lines of payments I have. Are there any cleaver/advanced algorithms that can be applied in this situation to improve computation speed?