How to get sum of transaction amount of every customer after some date in other data frame?

Question

I have two data frame. one has multiple transaction for different customers on different dates and another data frame specifying a particular date. I wanted to get the sum of all transactions after the date specified into the other data frame. My 1st input table looks like :

ID  txn_date    txn_amt
1   01-Jan-19   3000
1   15-Jan-19   2000
1   16-Feb-19   5000
2   03-Jan-19   3000
2   19-Feb-19   9000
2   09-Mar-19   2000
2   01-Apr-19   1000
3   02-Apr-19   7000

and the 2nd table looks like this:

ID  notice_date
1   14-Jan-19
2   02-Feb-19
4   05-Feb-19
3   05-Mar-19

My expected output is :

ID  txn_amt
1   7000
2   12000
3   7000
4   0

The logic is, ID 1 in 1st table has 3 transactions where ID 1's notice_date is 14-jan-19, so after 14-jan-19 there are only two transaction of ID 1, so in the output table ID ones value is 7000 (2000 + 5000).

please share the data as you have it, e.g. `print(df.head())` — IanS, Jul 17 '19 at 12:45
before asking a question, you should check [how to ask good pandas questions](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) — IanS, Jul 17 '19 at 12:51

score 0 · Answer 1 · answered Jul 17 '19 at 13:04

0

You can join both tables, using the ID as an index (don't worry about duplication for this purpose), convert all dates from string to some datetime format, and then sum the txn_amt on the condition that txn_date>notice_date, and finally group by ID and sum.

Like this:

joint_df = df1.set_index('ID').join(df2.set_index('ID'))
joint_df['txn_date'] = joint_df['txn_date'].apply(pd.Timestamp)
joint_df['notice_date'] = joint_df['notice_date'].apply(pd.Timestamp)

joint_df['txn_amt'][joint_df['txn_date'] >= joint_df['notice_date']].groupby('ID').sum()

answered Jul 17 '19 at 13:04

Itamar Mushkin

2,803
2
16
32

it is working but the ID 4 the value should come as 0, which is not comming – roy Jul 17 '19 at 13:10
You're right, you can wrap it up with another join to one the ID column in the first dataframe (which will result in `NaN` at ID=4) and then `fillna(0)`... – Itamar Mushkin Jul 17 '19 at 13:37

score 0 · Accepted Answer · answered Jul 17 '19 at 13:12

0

IIUC, you can try this:

df1.merge(df2, on = ['ID'], how='right')\
   .query('txn_date > notice_date or txn_date != txn_date')\
   .fillna(0).groupby('ID')['txn_amt'].sum()

Output:

ID
1     7000.0
2    12000.0
3     7000.0
4        0.0
Name: txn_amt, dtype: float64

There are a couple of "tricks" in this solution. First, merge on 'ID' with how set to 'right' keeps the ID equal to 4 where there are no records related to 4 in df1. Secondly, filtering the results of the merge using two conditions. The first condition of dates, then in cases where that is no txn_date for 'ID' 4, then we want to keep those NaN records. In python, a trick for checking NaN is NaN != NaN is True.

answered Jul 17 '19 at 13:12

Scott Boston

147,308
15
139
187

does `.query` understand the string values in the date columns as datetime? – Itamar Mushkin Jul 17 '19 at 13:36
@ItamarMushkin I don't think so. You would need to switch to slicing the dataframe instead of using `.query`. `.query` is a shortcut and doesn't work in a lot of different scenarios, like column names with a space or special characters. – Scott Boston Jul 17 '19 at 13:39
Or you could just transform the data into a datetime object (from datetime or pandas) before use? – Itamar Mushkin Jul 18 '19 at 07:33

How to get sum of transaction amount of every customer after some date in other data frame?

2 Answers2