Pandas unable to find Which stores has good quarterly growth rate in Q3 using DataFrame

Question

I have got a Dataframe , i want to find which store has good quarterly growth rate in Q3

  Store    Date     Weekly_Sales
0   1   2012-03-31  18951097.69
1   1   2012-06-30  21036965.58
2   1   2012-09-30  18633209.98
3   1   2012-12-31  9580784.77
4   2   2012-03-31  22543946.63
5   2   2012-06-30  25085123.61
6   2   2012-09-30  22396867.61
7   2   2012-12-31  11470757.52

I managed to loop through the items and got this far but after that i am unable to find any way. I think i have to go to the next value and get the sales and then add it,but i m not sure how to do that. I want to compare index 1 and 2 of Store 1 and find growth rate , again doing the same thing for Store 2 here index 5 and 6 and So on as i have total 45 Stores available.

new_df = []
for index, row in monthly_sales.iterrows():
    if index == 1:  ----Not sure what condition to put here 
      q2 = row['Weekly_Sales']
      q3 = row['Weekly_Sales']
      growth_rate = (q3 - q2)/(q2*100)
      new_df.append([row['Store'],growth_rate])
      #print(index, row['Store'],row['Date'], row['Weekly_Sales'])
      #exit;
new_df

Output can be something like this

  Store Growth Rate
0   1      6.67890
1   2      5.54327

I am a newbie to Python and Pandas.

I have edited the question but i am not sure i m doing it right — Megan Fox, May 04 '20 at 15:49
@MeganFox Is `[[1, 0.0]]` the desired output ? I suspect this is the current output. What is the expected output from the dataset you provided ? — Alexandre B., May 04 '20 at 16:14
@sammywemmy I have updated into somewhat i want the output to be sorted from the max to the least — Megan Fox, May 05 '20 at 07:49
@MeganFox Is there only one row per Quarter or it might happen there are several rows ? — Alexandre B., May 05 '20 at 07:49
@AlexandreB. There is 1 row per quarter for each store.Every store has 4 rows divided into quarters as you can see by the data — Megan Fox, May 05 '20 at 07:51

score 2 · Accepted Answer · answered May 05 '20 at 08:27

You can try:

df["Date"] = pd.to_datetime(df["Date"])
df["Weekly_Sales"] = pd.to_numeric(df["Weekly_Sales"])


out = df.sort_values(by=["Store", "Date"]) \
        .groupby(["Store"]) \
        .agg(growth_Q3=("Weekly_Sales", lambda x: (x.iloc[2] - x.iloc[1])/(x.iloc[1]) * 100))

Explanations:

Convert the columns to the appropriate format (do it if they are not). To see the format, you can use dtypes.
1. Convert Dates to datetime object using to_datetime
2. Convert Weekly_Sales to numeric using to_numeric
Sort the values by Store and Dates in order to be sure the dates are chronologically sorted. We can use sort_values
Groupby store to compute the growth_rate on each of them
For each group, aggregate the rows using agg using our custom aggregation function:
1. We firstly compute the growth rate using a lambda function. We use iloc to select quarter2 and quarter3 values. The formula used is: (Q3-Q2)/Q2 * 100.
2. Then we use handy notation is agg function to rename the result as growth_Q3. We use "Weekly_Sales" before lambda to say that the lambda function will be applied on "Weekly_Sales" column.

Full code + illustration:

# Step 1 (Optionnal if types are already correct)
print(df.dtypes)
# Store                    int64
# Date                    object
# Weekly_Sales            object
# dtype: object

df["Date"] = pd.to_datetime(df["Date"])
df["Weekly_Sales"] = pd.to_numeric(df["Weekly_Sales"])
print(df.dtypes)
# Store                    int64
# Date            datetime64[ns]
# Weekly_Sales           float64
# dtype: object

# Step 2 (Optionnal if data already sorted)
print(df.sort_values(by=["Store", "Date"]))
#    Store       Date  Weekly_Sales
# 0      1 2012-03-31   18951097.69
# 1      1 2012-06-30   21036965.58
# 2      1 2012-09-30   18633209.98
# 3      1 2012-12-31    9580784.77
# 4      2 2012-03-31   22543946.63
# 5      2 2012-06-30   25085123.61
# 6      2 2012-09-30   22396867.61
# 7      2 2012-12-31   11470757.52

# Step 4
print(df.sort_values(by=["Store", "Date"])
        .groupby(["Store"])
        .agg(growth_Q3=("Weekly_Sales", lambda x: (x.iloc[2] - x.iloc[1])/x.iloc[1] * 100)))
#        growth_Q3
# Store
# 1     -11.426342
# 2     -10.716535

sammywemmy · Answer 2 · 2020-05-05T08:56:51.703

#get the quarters into a different column : 
df['Quarter'] = df.Date.dt.quarter
#get the groupings for the percent change from quarters 2 to 3 : 
pct_change = (df.query('Quarter in [2,3]')
              .groupby('Store')
              .Weekly_Sales
              .pct_change()
              .mul(100)
              .dropna()
             )
pct_change

2   -11.426342
6   -10.716535
Name: Weekly_Sales, dtype: float64

#get store number at third quarter:
store = df.loc[df['Quarter']==3,'Store']

2    1
6    2
Name: Store, dtype: int64

#merge the two objects
pd.concat([store,pct_change],axis=1)

    Store   Weekly_Sales
2   1   -11.426342
6   2   -10.716535

Another approach :

we know that the data is arranged for each store, and there are 4 rows per store, indicating the quarters ... quarters 2 and 3 will be on indices 1 and 2 for each grouped store :

filtered = (df
             #the nth function allows us to pick rows per group
            .groupby('Store').nth([1,2])
            .pivot(columns='Quarter',values='Weekly_Sales')
            .pct_change(axis=1)
            .mul(100)
            .dropna(axis=1)
            .rename(columns={3:'growth'})
           )

filtered

Quarter growth
Store   
1       -11.426342
2       -10.716535

https://stackoverflow.com/questions/61612351/pandas-unable-to-filter-rows-by-quarter-in-specific-year any help on this , guess the dataset is itself calculating the wrong quarters — Megan Fox, May 05 '20 at 11:51

Pandas unable to find Which stores has good quarterly growth rate in Q3 using DataFrame

2 Answers2