Iterate over one dataframe to calculate new features - Python

Question

I am using a dataframe of credit card transactions with the following columns:

timestamp, transaction_id, buyer_id, status

I want no generate a new column q_app_1d that calculates for each transaction_id, the amount of previous transaction_id's based on a condition (same buyer_id, status = 1, difference between timestamp's <= 1 day).

I have tried doing this using self-join (aka joining the dataframe with itself), but have failed to do so successfully. I know how to do this easily in SQL, but I can't get it working with Pandas.

Any help or tips are greatly appreciated!

Edit:

Sample input:

timestamp, transaction_id, buyer_id, status
01/01/2020 00:00:00, 1, abc123, 1
01/01/2020 00:25:00, 2, abc123, 1
01/01/2020 00:30:00, 3, abc123, 1
01/01/2020 00:45:00, 4, def456, 1
02/01/2020 08:55:00, 5, abc123, 1
02/01/2020 10:55:00, 6, def456, 1
03/01/2020 12:55:00, 7, def456, 1

Sample output:

timestamp, transaction_id, buyer_id, status, q_app_1d
01/01/2020 00:00:00, 1, abc123, 1, 0
01/01/2020 00:25:00, 2, abc123, 1, 1 #(considers transaction_id 1)
01/01/2020 00:30:00, 3, abc123, 1, 2 #(considers transaction_id 1,2)
01/01/2020 00:45:00, 4, def456, 1, 0
02/01/2020 08:55:00, 5, abc123, 1, 0 #(more than one day since transaction_id 3)
02/01/2020 10:55:00, 6, def456, 1, 0 #(more than one day since transaction_id 4)
03/01/2020 08:55:00, 7, def456, 1, 1 #(considers transaction_id 6)

Please provide sample input and expected output. – Mayank Porwal May 04 '20 at 22:22 — Mayank Porwal, May 04 '20 at 22:22
already edited, hope it's clear! – yavin_ar May 04 '20 at 22:50 — yavin_ar, May 04 '20 at 22:50

score 0 · Accepted Answer · answered May 05 '20 at 01:13

0

This should work:

df['timestamp'] = pd.to_datetime(df['timestamp'],dayfirst=True)
df = df.set_index('timestamp')

_df = (df.groupby('buyer_id')['status'].rolling('24H').count()-1).reset_index()
_df.columns = ['buyer_id','timestamp','q_app_1d']
df = df.reset_index()
df = df.merge(_df)
df.head(7)

answered May 05 '20 at 01:13

Partha Mandal

1,391
8
14

Hey Martín! Sure. Btw can you give an upvote to the answer as well? – Partha Mandal May 05 '20 at 15:10
i did, but since i'm new it says that it doesn't display it publicly ): – yavin_ar May 05 '20 at 17:12
Oh! I missed that. Thanks, btw. – Partha Mandal May 05 '20 at 18:24

Iterate over one dataframe to calculate new features - Python

1 Answers1