0

I am working with transaction data and would like to get past 12 months rolling Active Customer Base but on monthly level.

Below is an example of transaction data i have - Transaction Data

Cust-ID Date-of-Purchase    Quantity    Customer-Tag

N01847  01-04-18    10  B2C
N01847  15-04-18    20  B2C
N01847  25-06-19    20  B2C
N01847  12-02-20    100 B2C
N01847  25-03-20    150 B2C
N02341  01-12-19    20  B2B
N02341  02-02-20    150 B2B
N02011  25-01-19    10  B2B
N01033  02-06-18    30  B2C
N01033  02-04-19    40  B2C
N01033  15-04-19    50  B2C
N01033  15-02-20    20  B2C
N01010  16-01-18    100 B2C
N01010  25-02-18    250 B2C
N01010  03-09-18    10  B2C
N01010  04-04-19    250 B2C
N01010  15-06-19    100 B2C
N01010  26-07-19    40  B2C
N01010  17-09-19    10  B2C
N01010  17-09-19    10  B2C
N01010  20-03-20    20  B2C
N09100  20-01-18    20  B2B
N09100  20-01-18    20  B2B
N09100  25-01-20    20  B2B
N09100  25-01-20    20  B2B
N09100  25-01-20    20  B2B   ֿ
N09100  25-01-20    20  B2B

And below is what i am expecting to get with python - desired output

Month-Year  B2C-Rolling-past-12months-Active-Customers  Monthly-Active-Customers    Monthly-Active-Rate

201801  100230  25058   25.0%

201802  100524  25634   25.5%

201803  100810  25213   25.0%

201804  101253  25495   25.2%

201805  101351  25525   25.2%

201806  103210  25998   25.2%

201807  103678  26122   25.2%

201808  103977  26202   25.2%

201809  104512  26342   25.2%

201810  104624  26376   25.2%

201811  105479  26597   25.2%

201812  111256  28059   26.2%

201901  112247  28314   25.2%

201902  112947  28497   25.2%

201903  113508  28644   25.2%

201904  113857  28737   25.2%

201905  114572  28924   25.2%

201906  115443  29149   25.3%

201907  116056  29310   25.3%

201908  116528  29435   25.3%

201909  116971  29553   25.3%

201910  117647  29729   25.3%

201911  118492  29949   25.3%

201912  124095  31371   26.3%

202001  124895  31580   25.3%

202002  125653  31778   25.3%

202003  126320  31953   25.3%

I would really appreciate any help with the python (spyder) code that will help to get the desired result.

dspencer
  • 4,297
  • 4
  • 22
  • 43

1 Answers1

0

You may want to use pandas, and then do something like that:

df["Date-of-Purchase"] = pd.to_datetime(df["Date-of-Purchase"], dayfirst=True)
df["Month"] = df["Date-of-Purchase"].dt.month
df["Year"] = df["Date-of-Purchase"].dt.year
res = df.groupby(["Year", "Month"])["Cust-ID"].nunique()

This would give you the number of unique customers per month. Assuming you have data for all the months, you can now use rolling to get a rolling sum for 12 months (I used 3 months in the code for easier debugging):

monthly_customers = df.groupby(["Year", "Month"])["Cust-ID"].apply(lambda x: frozenset(x.values))
monthly_customers = monthly_customers.reset_index()

monthly_customers = pd.concat([monthly_customers] + [monthly_customers["Cust-ID"].shift(i) for i in range(1, 3)], axis ="columns")
monthly_customers.columns = ["Year", "Month"] + [ f"shift_{i}" for i in range(3) ]

def count_unique(row):
    total_set = frozenset()
    columns = [ f"shift_{i}" for i in range(3) ]
    for col in columns:
        if row.get(col) and type(row.get(col)) == frozenset:
            total_set = total_set | row.get(col)
    return len(total_set)

monthly_customers["N_month_count"] = monthly_customers.apply(count_unique, axis=1)
monthly_customers

If you don't have data for all the months, you'll need to fill in missing months.

Roy2012
  • 11,755
  • 2
  • 22
  • 35
  • this will give me the no. of monthly unique customers, however the rolling 12 month sum of this will not give me the no. of unique customers in the last 12 months. - it will give me the sum of monthly unique which is different. – kassandra martin May 16 '20 at 09:21
  • @kassandramartin: could you add some clarifications to your question? In general we recommend making an attempt on the problem first, so that readers can have a good immediate view of what you are wanting to do. – halfer May 16 '20 at 11:03
  • @kassandramartin - you're right. Sum is not the same as count-unique. Fixed my solution. – Roy2012 May 16 '20 at 17:24
  • @Roy2012 thanks a ton. I just had to change 3 to 12 since i needed 12 month rolling, and this seems to work. Cheers – kassandra martin May 16 '20 at 19:38