I have two dataframes, test1
and test2
. For each ID
value in test2
, I want to check the date
in test2
and compare it to the date ranges for that same ID
value in test1
. If any of the date
's in test2
are within a date range in test1
, sum the amount
column and assign that sum as an additional column in test1
.
Output:
So the new test1
df will have a column amount_sum
which is the sum of all amounts in test2
where the date
is within the date range of test1
- for that ID
import random
import string
test1 = pd.DataFrame({
'ID':[''.join(random.choice(string.ascii_letters[0:4]) for _ in range(3)) for n in range(100)],
'date1':[pd.to_datetime(random.choice(['01-01-2018','05-01-2018','06-01-2018','08-01-2018','09-01-2018'])) + pd.DateOffset(int(np.random.randint(0, 100, 1))) for n in range(100)],
'date2':[pd.to_datetime(random.choice(['01-01-2018','05-01-2018','06-01-2018','08-01-2018','09-01-2018'])) + pd.DateOffset(int(np.random.randint(101, 200, 1))) for n in range(100)]
})
test2 = pd.DataFrame({
'ID':[''.join(random.choice(string.ascii_letters[0:4]) for _ in range(3)) for n in range(100)],
'amount':[random.choice([1,2,3,5,10]) for n in range(100)],
'date':[pd.to_datetime(random.choice(['01-01-2018','05-01-2018','06-01-2018','08-01-2018','09-01-2018'])) + pd.DateOffset(int(np.random.randint(0, 100, 1))) for n in range(100)]
})