Here is my dataframe:
import pandas as pd
df = pd.DataFrame({'A': ['one', 'one', 'two', 'two', 'one'],
'B': ['Ar', 'Br', 'Cr', 'Ar', 'Ar'],
'C': ['12/15/2011', '11/11/2001', '08/30/2015', '07/3/1999', '03/03/2000'],
'D': [1, 7, 3, 4, 5],
'F': ['12/1/2011','10/1/2000','8/15/2015','12/1/2011','12/1/2011'] })
df['C'] = pd.to_datetime(df['C'])
df['F'] = pd.to_datetime(df['F'])
I would like to group by column B
and then for each group check if column C
contains date within 30 days of column F
. I would get back an indicator column for the whole group, which should look like
df['indicator'] = [1,0,1,1,1]
here is what I tried:
def date_test(x, y):
result = False
for i in x.index:
if x[i]<y[i]+ pd.Timedelta(days=30):
result = True
return result
df['indicator'] = df.groupby('B')['C','F'].transform(date_test).astype('int64')
But I got back TypeError: Transform function invalid for data types
So I guess I cannot pass two columns to transform function. Any thoughts?