Is there a more efficient method to match payments to respective contracts besides nested for loops?

Question

Note: I have working code, just wondering if there are better methods to implement. My goal to to try to minimize run time if possible. Code is in R currently, would swapping to Python or C++ help speed things up in this case?

I have two tables: one is a table of payments and the other a table of contracts. Each table, at minimum, shares: account numbers, company codes, start dates and end dates (payment period dates in the payment table and contract period dates in the contract table). My objective is to create a third table of payments that cannot be matched/allocated to a contract. A payment matches/allocates to a contract if all three of the following conditions are true:

Both payment and contract have matching account number
Both payment and contract have matching company codes
Payment start date falls within contract start and end dates

Payment Table

ID	AccountNumber	CompanyCode	StartDate	EndDate
1	123	35	1Jan22	1Feb22
2	123	37	1Mar19	1May19
3	123	35	1Apr23	1Jun23
4	123	35	1Mar22	1Apr22

Contract Table

ID	Account Number	Company Code	StartDate	EndDate
1	123	35	1Dec21	1Dec22
2	123	37	1Dec18	1Dec19

Sample data for the purpose of finding answers:

# Create Payment Table
PaymentTable <- data.frame(
  ID = c(1, 2, 3, 4),
  AccountNumber = c(123, 123, 123, 123),
  CompanyCode = c(35, 37, 35, 35),
  StartDate = dmy(c("1Jan22", "1Mar19", "1Apr23", "1Mar22")),
  EndDate = dmy(c("1Feb22", "1May19", "1Jun23", "1Apr22"))
)

# Create Contract Table
ContractTable <- data.frame(
  ID = c(1, 2),
  AccountNumber = c(123, 123),
  CompanyCode = c(35, 37),
  StartDate = dmy(c("1Dec21", "1Dec18")),
  EndDate = dmy(c("1Dec22", "1Dec19"))
)

In the above example data set, Payment IDs 1 and 4 match to Contract ID 1. Payment ID 3 matches to Contract ID 2. Payment ID 3 does not match to any contract.

So the conditions are:

Check if for any ID of the payment dates are in between contract ID's dates
Check if IDs, AccountNumber, and CompanyCode and are matching in both the tables

When above two conditions are met, merge the table.

I have simple working R code via what I call the naïve method of nested for loops. The outer for loop iterating thru every line of payment and the inner for loop iterating thru every contract. It takes about 10ish minutes for the code to complete 1/5 of all the lines of payments I have. Are there any cleaver/advanced algorithms that can be applied in this situation to improve computation speed?

Sounds like this could be achieved using `merge` or `dplyr` joins, followed by a filter for the start date condition. But it will be easier to help if you [make this question reproducible](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) by including small representative datasets and example output in a plain text format, as well as all the relevant code. — neilfws, Apr 13 '23 at 04:23
Sounds like you want an overlap join. See `?dplyr::join_by`. — Ritchie Sacramento, Apr 13 '23 at 04:24
Also see https://stackoverflow.com/q/1299871/17303805 for an overview of merge operations in R. — zephryl, Apr 13 '23 at 04:29
I think people who closed this post did not understand the exact requirements... Choy asked for certain conditions to be checked before merging the two tables.. — Manoj Kumar, Apr 13 '23 at 05:30
@ManojKumar - I understood the requirements. They are met via the join specifications of an overlap join. No need to 'pre-check' the data. — Ritchie Sacramento, Apr 13 '23 at 05:55
`PaymentTable %>% inner_join(ContractTable, join_by(ID, AccountNumber, CompanyCode, StartDate >= StartDate, StartDate <= EndDate))` — Jon Spring, Apr 13 '23 at 07:06

Is there a more efficient method to match payments to respective contracts besides nested for loops?

0 Answers0