I have a dataset of sales and purchases on a market place, looking a little like this.
User_ID | Transaction_Type | Date | Amount
1 | Sale | 01/01/14 | 200.00
2 | Purchase | 01/01/14 | 30.00
...
I need to filter out customers who have only bought or sold something versus customers who have bought and sold something at least once.
I am trying to create a function that will check if a user has done both or not. If a user has done both, then the user will be marked as a yes, otherwise no.
So far I have tried doing this,
def user_filter(df):
if df in df['User_ID'].filter(lambda x : ((x['Transaction_Type']=='Sale').any())&((x['Transaction_Type']=='Purchase').any())):
return 'yes'
else:
return 'no'
df['cross'] = df['User_ID'].apply(user_filter)
Let's assume later on in the dataset that User_ID 1 will come back as a Purchase. I would hope it would return as :
User_ID | Transaction_Type | Date | Amount | cross
1 | Sale | 01/01/14 | 200.00 | yes
2 | Purchase | 01/01/14 | 30.00 | no
but the following error returns:
'int' object is not subscriptable
When i apply it to the whole dataframe as opposed to just the series, it returns:
KeyError: ('User_ID', 'occurred at index User_ID')