Find How many customers returned after 1year of inactivity,2year of inactivity etc

Question

I have a transaction data frame as follows.

        customer_id order_date     product_id   value($)
             139    2015-07-08           A       0.174
             139    2015-06-08           B       1.236
             432    2017-08-09           E       0.205
             474    2019-08-27           A       0.374
             164    2022-05-08           D       0.278

How do I find how many customers returned after one year (365 days) of inactivity, two years (730 days) of inactivity etc.?

timdelta as @ArpitOmprakash said then you could groupby that column if you wanted/needed to — rayad, Aug 25 '22 at 14:45
Can you guys write an answer with the codes? My attempts were not successful. — NAS_2339, Aug 25 '22 at 14:47
It would be easier to do if you provided a [dataframe constructor](https://stackoverflow.com/q/20109391/14627505) — Vladimir Fokow, Aug 25 '22 at 17:29
if a customer had two periods of 1 year inactivity, then what should the output be? 1 or 2? — Vladimir Fokow, Aug 25 '22 at 17:36

Vladimir Fokow · Answer 1 · 2022-08-25T18:02:40.213

Preparations:

df['order_date'] = pd.to_datetime(df['order_date'])
inact_period = pd.Timedelta('365 days')

Solution:

def f(ser, inact_period):
    ser = ser.sort_values()
    return (ser - ser.shift() > inact_period).any()
    
n = df.groupby('customer_id')['order_date'].agg(f, inact_period=inact_period).sum()

n is an integer - the result.

If one customer that has multiple inactivity periods should count multiple times, then you can replace .any() with .sum().

Example:

df = pd.DataFrame({'customer_id': [1, 1, 1, 1, 2],
                   'order_date': pd.to_datetime(['2015', '2016', '2017', '2019', '2022'])})

   customer_id order_date
0          139 2015-01-01
1          139 2016-01-01
2          139 2017-01-01
3          139 2019-01-01
4          164 2022-01-01

With .sum():

If I set inact_period = pd.Timedelta('365 days'), then n == 2 (since 2016 is a leap year there are 366 days between 2016-01-01 and 2017-01-01).

If I set inact_period = pd.Timedelta('366 days'), then n == 1.

With .any():

If I set inact_period = pd.Timedelta('1 d'), then n == 1.

Find How many customers returned after 1year of inactivity,2year of inactivity etc

1 Answers1