I was asked to do a merge between two dataframes, The first contains sales orders (1m rows) and the second contains discounts. applied to that sales orders, established according to a date range.
Both are joined by an ID (A975 ID) but I can't find an efficient way to do that merge without running out of memory.
My idea so far is to do an "outer" merge and once that's done(*), filter the "Sales Order Date" column by the value that is within the date range "DATAB"(start) - "DATBI"(end)
I was able to do it in PowerQuery with "SelectRows" but it takes too long, if I can replicate the same procedure in Python and gain a few minutes of processing it would be more than enough.
I know that an outer join generates tons of duplicated rows with only the date changed, but I don't know what else to do.