0

I have a transaction data frame as follows.

        customer_id order_date     product_id   value($)
             139    2015-07-08           A       0.174
             139    2015-06-08           B       1.236
             432    2017-08-09           E       0.205
             474    2019-08-27           A       0.374
             164    2022-05-08           D       0.278

How do I find how many customers returned after one year (365 days) of inactivity, two years (730 days) of inactivity etc.?

NAS_2339
  • 353
  • 2
  • 13

1 Answers1

1

Preparations:

df['order_date'] = pd.to_datetime(df['order_date'])
inact_period = pd.Timedelta('365 days')

Solution:

def f(ser, inact_period):
    ser = ser.sort_values()
    return (ser - ser.shift() > inact_period).any()
    
n = df.groupby('customer_id')['order_date'].agg(f, inact_period=inact_period).sum() 

n is an integer - the result.

If one customer that has multiple inactivity periods should count multiple times, then you can replace .any() with .sum().

Example:

df = pd.DataFrame({'customer_id': [1, 1, 1, 1, 2],
                   'order_date': pd.to_datetime(['2015', '2016', '2017', '2019', '2022'])})

   customer_id order_date
0          139 2015-01-01
1          139 2016-01-01
2          139 2017-01-01
3          139 2019-01-01
4          164 2022-01-01

With .sum():

If I set inact_period = pd.Timedelta('365 days'), then n == 2 (since 2016 is a leap year there are 366 days between 2016-01-01 and 2017-01-01).

If I set inact_period = pd.Timedelta('366 days'), then n == 1.

With .any():

If I set inact_period = pd.Timedelta('1 d'), then n == 1.

Vladimir Fokow
  • 3,728
  • 2
  • 5
  • 27