0

I'm trying to break down yearly subscriptions to monthly subscriptions by fee.

Example dataset-

import numpy as np
import pandas as pd

df = pd.DataFrame({
    'Customer_ID': [1, 2, 3, 4, 5],
    'Plan' : ['Yearly', 'Monthly', 'Monthly', 'Yearly', 'Yearly'],
    'Join_Date': ['1/10/2020', '1/15/2020', '2/21/2020', '2/21/2020', '3/09/2020'],
    'Fee' : [120, 12, 18, 86, 144]
})

df['Join_Date'] = pd.to_datetime(df['Join_Date'])

df

Here, customer 1 has a yearly subscription fee of $120 between January 2020 and January 2021. I want my data frame to breakdown the fee between 2020-01 and 2020-12 to $10 ($120/12 months) each by displaying each month of that year with the monthly fee ($10).

I tried a bunch of resampling methods, but it didn't work. One example-

def atom(row):
    if df.Plan=='Yearly':
        return (df.Fee/12)

df.groupby(pd.Grouper(key='Join_Date', freq='1M')).apply(atom)

Expected output for first customer-

enter image description here

Is there another method?

Sp_95
  • 133
  • 9
  • Should your final dataframe have a row for each month? Can you add your desired result to the question? – Maximilian Peters Nov 28 '20 at 21:15
  • can you not simply do `df.loc[df['Plan'] == 'Yearly'],'new_col'] = df.Fee/12` – Joe Ferndz Nov 28 '20 at 21:16
  • @JoeFerndz No, that doesn't output each month between date X and date Y with the per month value. I'm trying to display each month between Jan 2020 to Jan 2021 with $10 in the same dataframe. – Sp_95 Nov 28 '20 at 21:21
  • Can you please post the desired output so we can know what you are looking for please? – Joe Ferndz Nov 28 '20 at 21:22

2 Answers2

2

First expand the yearly records by np.repeat(). Then perform the following action selectively on df1["Plan"] == "Yearly":

  • Monthly fee can be computed directly.
  • Increments to months can be obtained using groupby-cumcount and mapped onto pd.DateOffset(months=). Such method receives a PerformanceWarning, which can be suppressed this way (omitted in the code).

Code

# expand the Yearly records
df1 = df.loc[np.repeat(df.index, df["Plan"].map({"Yearly": 12, "Monthly":1}))]

# compute monthly fee and join date 
df1.loc[df1["Plan"] == "Yearly", "Fee"] /= 12
df1.loc[df1["Plan"] == "Yearly", "Join_Date"] += \
    df1.groupby(["Customer_ID", "Plan"]).cumcount()\
       .loc[df1["Plan"] == "Yearly"]\
       .map(lambda i: pd.DateOffset(months=i))

Result

print(df1)
   Customer_ID     Plan  Join_Date        Fee
0            1   Yearly 2020-01-10  10.000000
0            1   Yearly 2020-02-10  10.000000
0            1   Yearly 2020-03-10  10.000000
0            1   Yearly 2020-04-10  10.000000
0            1   Yearly 2020-05-10  10.000000
0            1   Yearly 2020-06-10  10.000000
0            1   Yearly 2020-07-10  10.000000
0            1   Yearly 2020-08-10  10.000000
0            1   Yearly 2020-09-10  10.000000
0            1   Yearly 2020-10-10  10.000000
0            1   Yearly 2020-11-10  10.000000
0            1   Yearly 2020-12-10  10.000000
1            2  Monthly 2020-01-15  12.000000
2            3  Monthly 2020-02-21  18.000000
3            4   Yearly 2020-02-21   7.166667
3            4   Yearly 2020-03-21   7.166667
3            4   Yearly 2020-04-21   7.166667
3            4   Yearly 2020-05-21   7.166667
3            4   Yearly 2020-06-21   7.166667
3            4   Yearly 2020-07-21   7.166667
3            4   Yearly 2020-08-21   7.166667
3            4   Yearly 2020-09-21   7.166667
3            4   Yearly 2020-10-21   7.166667
3            4   Yearly 2020-11-21   7.166667
3            4   Yearly 2020-12-21   7.166667
3            4   Yearly 2021-01-21   7.166667
4            5   Yearly 2020-03-09  12.000000
4            5   Yearly 2020-04-09  12.000000
4            5   Yearly 2020-05-09  12.000000
4            5   Yearly 2020-06-09  12.000000
4            5   Yearly 2020-07-09  12.000000
4            5   Yearly 2020-08-09  12.000000
4            5   Yearly 2020-09-09  12.000000
4            5   Yearly 2020-10-09  12.000000
4            5   Yearly 2020-11-09  12.000000
4            5   Yearly 2020-12-09  12.000000
4            5   Yearly 2021-01-09  12.000000
4            5   Yearly 2021-02-09  12.000000
Bill Huang
  • 4,491
  • 2
  • 13
  • 31
0

Are you looking for something like this?

import pandas as pd
df = pd.DataFrame({
    'Cutomer_ID': [1, 2, 3, 4, 5],
    'Plan' : ['Yearly', 'Monthly', 'Monthly', 'Yearly', 'Yearly'],
    'Join_Date': ['1/10/2020', '1/15/2020', '2/21/2020', '2/21/2020', '3/09/2020'],
    'Fee' : [120, 12, 18, 86, 144]
})

df['Join_Date'] = pd.to_datetime(df['Join_Date'])

df['Monthly_Fee'] = df['Fee']
df.loc[df['Plan'] == 'Yearly','Monthly_Fee'] = (df.Fee/12).round(2)

print (df)

Output of this will be:

   Cutomer_ID     Plan  Join_Date  Fee  Monthly_Fee
0           1   Yearly 2020-01-10  120        10.00
1           2  Monthly 2020-01-15   12        12.00
2           3  Monthly 2020-02-21   18        18.00
3           4   Yearly 2020-02-21   86         7.17
4           5   Yearly 2020-03-09  144        12.00
Joe Ferndz
  • 8,417
  • 2
  • 13
  • 33