0

Here is my dataframe:

import pandas as pd
df = pd.DataFrame({'A': ['one', 'one', 'two', 'two', 'one'],
                   'B': ['Ar', 'Br', 'Cr', 'Ar', 'Ar'],
                   'C': ['12/15/2011', '11/11/2001', '08/30/2015', '07/3/1999', '03/03/2000'],
                   'D': [1, 7, 3, 4, 5],
                   'F': ['12/1/2011','10/1/2000','8/15/2015','12/1/2011','12/1/2011'] })
df['C'] = pd.to_datetime(df['C'])
df['F'] = pd.to_datetime(df['F'])

I would like to group by column B and then for each group check if column C contains date within 30 days of column F. I would get back an indicator column for the whole group, which should look like

df['indicator'] = [1,0,1,1,1]

here is what I tried:

def date_test(x, y):

    result = False
    for i in x.index:
        if x[i]<y[i]+ pd.Timedelta(days=30):
            result = True

    return result

df['indicator'] = df.groupby('B')['C','F'].transform(date_test).astype('int64')

But I got back TypeError: Transform function invalid for data types

So I guess I cannot pass two columns to transform function. Any thoughts?

user1700890
  • 7,144
  • 18
  • 87
  • 183

2 Answers2

2

I think you're right, the way .transform() works is that the function passed evaluates each column (C and F in this case) separately. See here for more details.

However, I think you can use .apply() and get the results you want:

>>> dfGroup = df.groupby('B')
>>> dfGroup.apply(lambda x: x['C'] < x['F'] + pd.Timedelta(days=30))
>>> B    
    Ar  0     True
        3     True
        4     True
    Br  1    False
    Cr  2     True
    dtype: bool
Community
  • 1
  • 1
Brian Huey
  • 1,550
  • 10
  • 14
  • 1
    and to assign : `df['indicator'] = df.groupby('B').apply(date_test).swaplevel().reset_index(-1, drop=True)` – Zeugma Nov 22 '16 at 19:12
  • @Boud thank you for adding comment. It seems like the following also works `df['indicator'] = df.groupby('B').apply(date_test).reset_index(0, drop=True)` or am I missing something – user1700890 Nov 22 '16 at 21:00
1

I don't know if it will help you but something like :

df = {'1': 'one', '3': 'three', '2': 'two', '5': 'five', '4': 'four', 'indicator':[]}

if 'one' in df.values() == True:
    df['indicator'].append(1)
else:
    df['indicator'].append(0)

and then run it in a for loop to read all element in your 'C'

Dadep
  • 2,796
  • 5
  • 27
  • 40