I know there are some similar questions here, but please read on as I have looked and tried to adapt existing solutions without any luck. I have a data frame that pulls data for year and quarter. In the scenario shown below prevYearLeadCount
displays data from Q1 2020. To be clear
prevYearLeadCount
will always display the lead count from the previous year in the same quarter. The below is just an example to show how the data is structured. Also, looking at the data below, since there is data for Q4 of 2019 I would expect 2020 Q4 prevYearLeadCount
to equal 236
[
{
"salesforceAccountId": 3148,
"accountName": "Account Name",
"year": 2017,
"quarter": 2,
"leadCount": 151,
"prevYearLeadCount": 0.0
},
{
"salesforceAccountId": 3148,
"accountName": "Account Name",
"year": 2018,
"quarter": 2,
"leadCount": 73,
"prevYearLeadCount": 151.0
},
{
"salesforceAccountId": 3148,
"accountName": "Account Name",
"year": 2018,
"quarter": 3,
"leadCount": 271,
"prevYearLeadCount": 0.0
},
{
"salesforceAccountId": 3148,
"accountName": "Account Name",
"year": 2018,
"quarter": 4,
"leadCount": 173,
"prevYearLeadCount": 0.0
},
{
"salesforceAccountId": 3148,
"accountName": "Account Name",
"year": 2019,
"quarter": 1,
"leadCount": 209,
"prevYearLeadCount": 0.0
},
{
"salesforceAccountId": 3148,
"accountName": "Account Name",
"year": 2019,
"quarter": 2,
"leadCount": 274,
"prevYearLeadCount": 0.0
},
{
"salesforceAccountId": 3148,
"accountName": "Account Name",
"year": 2019,
"quarter": 3,
"leadCount": 311,
"prevYearLeadCount": 0.0
},
{
"salesforceAccountId": 3148,
"accountName": "Account Name",
"year": 2019,
"quarter": 4,
"leadCount": 236,
"prevYearLeadCount": 0.0
},
{
"salesforceAccountId": 3148,
"accountName": "Account Name",
"year": 2020,
"quarter": 1,
"leadCount": 245,
"prevYearLeadCount": 209.0
},
{
"salesforceAccountId": 3148,
"accountName": "Account Name",
"year": 2020,
"quarter": 2,
"leadCount": 430,
"prevYearLeadCount": 0.0
},
{
"salesforceAccountId": 3148,
"accountName": "Account Name",
"year": 2020,
"quarter": 3,
"leadCount": 907,
"prevYearLeadCount": 0.0
},
{
"salesforceAccountId": 3148,
"accountName": "Account Name",
"year": 2020,
"quarter": 4,
"leadCount": 657,
"prevYearLeadCount": 0.0
},
{
"salesforceAccountId": 3148,
"accountName": "Account Name",
"year": 2021,
"quarter": 1,
"leadCount": 609,
"prevYearLeadCount": 245.0
}
]
Looking at the data above, I would expect for the year of 2020 to look like this:
{
"salesforceAccountId": 3148,
"accountName": "Account Name",
"year": 2020,
"quarter": 1,
"leadCount": 209,
"prevYearLeadCount": 209.0
},
{
"salesforceAccountId": 3148,
"accountName": "Account Name",
"year": 2020,
"quarter": 2,
"leadCount": 430,
"prevYearLeadCount": 274
},
{
"salesforceAccountId": 3148,
"accountName": "Account Name",
"year": 2020,
"quarter": 3,
"leadCount": 907,
"prevYearLeadCount": 311
},
{
"salesforceAccountId": 3148,
"accountName": "Account Name",
"year": 2020,
"quarter": 4,
"leadCount": 657,
"prevYearLeadCount": 236
},
{
"salesforceAccountId": 3148,
"accountName": "Account Name",
"year": 2021,
"quarter": 1,
"leadCount": 609,
"prevYearLeadCount": 245.0
}
As seen here I have tried the following:
df['prev_year_lead_count'] = df.groupby("quarter").lead_count.shift()[ (df.year == df.year.shift() + 1) ]
This is close as I get what I would expect in some cases, but not all. In some frames I see 0s where I should data definitely exist in the previous year and quarter. I am trying to do exactly what is seen here but each year is broken into quarters.
One more thing I have tried is combining python and pandas a bit. The idea here is to loop over existing years in the frame, and check the previous year to see if the quarter exist. If it does, do the pandas.
qs = [1, 2, 3, 4]
for year in leads_df["year"].unique():
df = leads_df[leads_df["year"] == year - 1]
for q in qs:
if q in df["quarter"]:
leads_df["prev_year_lead_count"] = leads_df.groupby("quarter")["lead_count"].shift(+1)
leads_df["prev_year_cost"] = leads_df.groupby("quarter")["cost"].shift(+1)
leads_df["prev_year_ga_spent"] = leads_df.groupby("quarter")["ga_spent"].shift(+1)
leads_df["prev_year_fb_spent"] = leads_df.groupby("quarter")["fb_spent"].shift(+1)
leads_df["prev_year_monthly_package_cost"] = leads_df.groupby("quarter")[
"monthly_package_cost"
].shift(+1)
leads_df["prev_year_cpl"] = leads_df.groupby("quarter")["cpl"].shift(+1)