Merging two DFs by date range

Question

I was asked to do a merge between two dataframes, The first contains sales orders (1m rows) and the second contains discounts. applied to that sales orders, established according to a date range.

Both are joined by an ID (A975 ID) but I can't find an efficient way to do that merge without running out of memory.

My idea so far is to do an "outer" merge and once that's done(*), filter the "Sales Order Date" column by the value that is within the date range "DATAB"(start) - "DATBI"(end)

I was able to do it in PowerQuery with "SelectRows" but it takes too long, if I can replicate the same procedure in Python and gain a few minutes of processing it would be more than enough.

I know that an outer join generates tons of duplicated rows with only the date changed, but I don't know what else to do.

IMG Table to Merge

See if any of this suggestion help(comment section) https://stackoverflow.com/questions/40860457/improve-pandas-merge-performance — teedak8s, Jun 24 '22 at 05:36
you can also investigate dask module if it is allowed to use one in your proceedures — NoobVB, Jun 24 '22 at 08:00

score 0 · Answer 1 · answered Jun 26 '22 at 14:42

0

The solution was merging and pd.query

df = pd.merge(all_together, A904, how="outer", left_on='ID A904', right_on='ID A904')\
                    .query('DATAB <= ERDAT <= DATBI')

answered Jun 26 '22 at 14:42

Gabriel Cid

11

Merging two DFs by date range

1 Answers1