The aim:
I want to merge two dataframes on the basis of the unique number and the date matching within +/-7 days
The data:
df1
Number Report DateDone
1 some words 13/1/2021
1 more stuff 21/8/2021
44 balbla 11/4/2020
2 gobbledy bla 01/03/2019
44 rara rasputin 13/10/2021
44 tree frogs 11/10/2010
df2
Number Report DateDone
1 hocum poklum 11/1/2021
1 mjimmeny cricket 21/8/2021
44 it wasnt me 11/2/2020
2 its not really 6/03/2019
44 im innocent 12/10/2021
44 bullfrogs 11/01/2010
The intended result
Number.df1 Report.df1 DateDone.df1 Number.df2 Report.df2 DateDone.df2
1 some words 13/1/2021 1 hocum poklum 11/1/2021
1 more stuff 21/8/2021 1 jimmeny cricket 21/8/2021
2 gobbledy bla 01/03/2019 2 its not really 6/03/2019
44 rara rasputin 13/10/2021 44 im innocent 12/10/2021
I was going to use a sql merge similar to one I found here but I am having difficulty knowing how to merge on the number and a date range. Do I need to calculate the 7 days before and after the DateDone in df1? Surely there is a more efficient way than having to calculate two new columns first?
qry = '''
select
df1.DateDone_start TermStart,
df1.DateDone_end TermEnd,
df2.DateDone df2Start,
df1.Number,
df2.Number
from
df1 join df2 on
date between df1.DateDone_start and df1.DateDone_end join df1 on
df1.Number = df2.Number
'''
df = pd.read_sql_query(qry, conn)