I have two dataframes, df1 and df2:
import pandas as pd
import datetime
import numpy as np
df1 = pd.DataFrame()
df1["StartDate"] = pd.date_range(start=('2016/01/01'), end = ('2017/12/31'),
freq="6D")
df1["EndDate"] = df1["StartDate"] + datetime.timedelta(5)
df1["Value_three"] = pd.Series(np.random.randn(len(df.Date))+2)
df1 has a start date, end date and value (named value_three). df2 has a date, value_one and value two.
df2 = pd.DataFrame()
df2["Date"] = pd.date_range(start=('2016/01/01'), end=('2017/12/31'), freq="D")
df2["Value_one"] = pd.Series(np.random.randn(len(df.Date)))
df2["Value_two"] = pd.Series(np.random.randn(len(df.Date))+1)
What am I looking to achieve is something similar to the SUMIFS function in Excel:
In this case, I'd like add two columns to df1 that are values_1 and value_2 summed respectively where the "Date" column is df2 is between df1.StartDate and df1.EndDate.
Let's assume it is not possible to pre-aggregate this in Vertica/MSSQL/etc. We also cannot use resample as the start date and end date can be irregular (in this case, separated by 11 days, but the gap can be ever changing).
I've tried join, merge, melt, concat, and I am proving a failure! Hopefully someone can kindly assist.