I've been going round StackOverflow looking at all the similar questions (here, here, here, here and here), and haven't found a satisfying answer, here's my solution.
First, the best answers I found:
use sqlite or equivalent, in memory, which makes it very easy to write an SQL query using the SQL "between" syntax on join. It works, but if you have strange datatypes in your dataframe it fails because it cannot convert them to SQL. See this answer
use some difficult numpy syntax, using piecewise(). But it's quite memory heavy. See this answer
use pd.IntervalIndex, although I am not sure it works in every case, see the comments at the bottom of this answer.
My answer:
Do the merge twice, basically.
- Merge the first time on the non-date columns
- Remove all the out-of-range date rows, keeping track of each row's index
- Merge a second time on the correct indexes
We want to merge these two DataFrames, df1 and df2:
df1 = pandas.DataFrame({'Code 1': ['001','002','003','005'],
'Code 2': ['192','193','193','195'],
'Date': pandas.date_range("2018-02-01", periods = 4, freq = "D")
})
df2 = pandas.DataFrame({'Code 1': ['002','003','005','006'],
'Code 2': ['193','193','195','195'],
'Date Start': pandas.to_datetime("2018-01-01"),
'Date End': pandas.to_datetime("2018-02-04")
})


And here's my solution:
#we want to use the original index for the join, so we add it as a column (or multiple columns if multiindex)
#also, we only need to keep the columns useful for the merge, to save memory
df1_ = df1[['Code 1', 'Code 2', 'Date']].reset_index()
df2_ = df2[['Code 1', 'Code 2', 'Date Start', 'Date End']].reset_index()
#we create a helper df, which at first merges df1 and df2 ignoring dates
df_h = df1_.merge(df2_, on = ['Code 1', 'Code 2'], how = 'inner', suffixes = ['_df1', '_df2'])
#then we filter the helper df for only the rows where the dates match
df_h = df_h[(df_h['Date'] > df_h['Date Start']) &
(df_h['Date'] < df_h['Date End'])]
#and we keep only the index columns of df1 and df2, which hold the information about which row in df1 merges on which row in df2
df_h = df_h[['index_df1','index_df2']].set_index('index_df1')
df2 = df2_.rename(columns = {'index': 'match_index'})
df_h = df1.join(df_h).rename(columns = {'index_df2': 'match_index'})
#finally our resulting DataFrame, which merges on "Code 1", "Code 2" and on the "match index"
df = df_h.merge(df2, on = ['Code 1', 'Code 2', 'match_index']).drop(['match_index'], axis = 1)
Result DataFrame
