Computing difference between same column, consecutive rows grouped by another column in python

Question

I have a dataframe with 2 columns: UserProductCombo, OrderDates. I have multiple order dates for each user/product group (1 to 5 dates per group).

I have sorted the data in descending order to get the top most order date for each group.

I would like to compute differences between the order dates for each group and put these in a new column in my dataframe (IN DAYS).

(i.e OrderDate1-OrderDate2, OrderDate1-OrderDate3, OrderDate1-OrderDate4, OrderDate1-OrderDate5) If not more than 2 orders exist, i want the it to move to the next group.

Sample data:

>>> bf_recency
        UserProduct               OrderDates
0   12111211/123232  2020-03-12 17:19:16.103
1   12111211/123232  2020-03-12 18:10:45.974
2   12111211/123232  2020-03-11 17:19:16.103
3   12111211/123232  2020-03-10 18:10:45.974
4   12111211/123232  2020-03-10 18:10:45.974
5   165870101/73066  2020-03-12 19:49:15.752

Expected Output:

        UserProduct               diff(in days)
0   12111211/123232               N/A
1   12111211/123232               0
2   12111211/123232               1
3   12111211/123232               2
4   12111211/123232               2
5   165870101/73066               N/A

So far I have this:

df_frequency =  df.groupby(["UserProduct"])['ORDER_DATE'].nlargest(5).reset_index(name ='OrderDates') 

df_frequency.sort_values(by=['OrderDates'],inplace=True, ascending=False)

df_freq = df_frequency.groupby(['UserProduct'])['OrderDates'].transform(lambda x: x.diff())  #STUCK HERE

Hello Ranjith. Please read up on [how to make good reproducible pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples). — timgeb, Apr 07 '20 at 06:41
@Ranjith Please provide a sample input and expected output. It helps explain the question better. — Mayank Porwal, Apr 07 '20 at 06:46
@MayankPorwal I've edited the post, can you take a look now? — PyCharmer, Apr 07 '20 at 07:23
@PyCharmer Now that's how you improve a question! Thank you. — timgeb, Apr 10 '20 at 08:50

Mayank Porwal · Accepted Answer · 2020-04-07T08:52:11.953

You can do this:

In [500]: df                                                                                                                                                                                                
Out[500]: 
       UserProduct              OrderDates
0  12111211/123232 2020-03-12 17:19:16.103
1  12111211/123232 2020-03-12 18:10:45.974
2  12111211/123232 2020-03-11 17:19:16.103
3  12111211/123232 2020-03-10 18:10:45.974
4  12111211/123232 2020-03-10 18:10:45.974
5  165870101/73066 2020-03-12 19:49:15.752

In [575]: df['diff(in days)'] = 0
In [583]: grp = df.groupby('UserProduct')['OrderDates']
In [576]: for i, group in grp:  
     ...:     df["diff(in days)"][df.index.isin(group.index)] = group.sub(group.iloc[0])
     ...: 
In [581]: df['diff(in days)'] = df['diff(in days)'].dt.days.abs()                                                                                                                                           

In [582]: df                                                                                                                                                                                                
Out[582]: 
       UserProduct              OrderDates  diff(in days)
0  12111211/123232 2020-03-12 17:19:16.103              0
1  12111211/123232 2020-03-12 18:10:45.974              0
2  12111211/123232 2020-03-11 17:19:16.103              1
3  12111211/123232 2020-03-10 18:10:45.974              2
4  12111211/123232 2020-03-10 18:10:45.974              2
5  165870101/73066 2020-03-12 19:49:15.752              0

Hi Mayank, But this is calculating difference between subsequent dates. I would want my result to calculate first row minus all other rows for that group in the column (as shown in my question) — PyCharmer, Apr 07 '20 at 07:57

Computing difference between same column, consecutive rows grouped by another column in python

So far I have this:

1 Answers1