1

I have got a Dataframe , i want to find which store has good quarterly growth rate in Q3

  Store    Date     Weekly_Sales
0   1   2012-03-31  18951097.69
1   1   2012-06-30  21036965.58
2   1   2012-09-30  18633209.98
3   1   2012-12-31  9580784.77
4   2   2012-03-31  22543946.63
5   2   2012-06-30  25085123.61
6   2   2012-09-30  22396867.61
7   2   2012-12-31  11470757.52 

I managed to loop through the items and got this far but after that i am unable to find any way. I think i have to go to the next value and get the sales and then add it,but i m not sure how to do that. I want to compare index 1 and 2 of Store 1 and find growth rate , again doing the same thing for Store 2 here index 5 and 6 and So on as i have total 45 Stores available.

new_df = []
for index, row in monthly_sales.iterrows():
    if index == 1:  ----Not sure what condition to put here 
      q2 = row['Weekly_Sales']
      q3 = row['Weekly_Sales']
      growth_rate = (q3 - q2)/(q2*100)
      new_df.append([row['Store'],growth_rate])
      #print(index, row['Store'],row['Date'], row['Weekly_Sales'])
      #exit;
new_df

Output can be something like this

  Store Growth Rate
0   1      6.67890
1   2      5.54327

I am a newbie to Python and Pandas.

Megan Fox
  • 435
  • 2
  • 6
  • 20

2 Answers2

2

You can try:

df["Date"] = pd.to_datetime(df["Date"])
df["Weekly_Sales"] = pd.to_numeric(df["Weekly_Sales"])


out = df.sort_values(by=["Store", "Date"]) \
        .groupby(["Store"]) \
        .agg(growth_Q3=("Weekly_Sales", lambda x: (x.iloc[2] - x.iloc[1])/(x.iloc[1]) * 100))

Explanations:

  1. Convert the columns to the appropriate format (do it if they are not). To see the format, you can use dtypes.

    1. Convert Dates to datetime object using to_datetime
    2. Convert Weekly_Sales to numeric using to_numeric
  2. Sort the values by Store and Dates in order to be sure the dates are chronologically sorted. We can use sort_values

  3. Groupby store to compute the growth_rate on each of them

  4. For each group, aggregate the rows using agg using our custom aggregation function:

    1. We firstly compute the growth rate using a lambda function. We use iloc to select quarter2 and quarter3 values. The formula used is: (Q3-Q2)/Q2 * 100.
    2. Then we use handy notation is agg function to rename the result as growth_Q3. We use "Weekly_Sales" before lambda to say that the lambda function will be applied on "Weekly_Sales" column.

Full code + illustration:

# Step 1 (Optionnal if types are already correct)
print(df.dtypes)
# Store                    int64
# Date                    object
# Weekly_Sales            object
# dtype: object

df["Date"] = pd.to_datetime(df["Date"])
df["Weekly_Sales"] = pd.to_numeric(df["Weekly_Sales"])
print(df.dtypes)
# Store                    int64
# Date            datetime64[ns]
# Weekly_Sales           float64
# dtype: object

# Step 2 (Optionnal if data already sorted)
print(df.sort_values(by=["Store", "Date"]))
#    Store       Date  Weekly_Sales
# 0      1 2012-03-31   18951097.69
# 1      1 2012-06-30   21036965.58
# 2      1 2012-09-30   18633209.98
# 3      1 2012-12-31    9580784.77
# 4      2 2012-03-31   22543946.63
# 5      2 2012-06-30   25085123.61
# 6      2 2012-09-30   22396867.61
# 7      2 2012-12-31   11470757.52

# Step 4
print(df.sort_values(by=["Store", "Date"])
        .groupby(["Store"])
        .agg(growth_Q3=("Weekly_Sales", lambda x: (x.iloc[2] - x.iloc[1])/x.iloc[1] * 100)))
#        growth_Q3
# Store
# 1     -11.426342
# 2     -10.716535
Alexandre B.
  • 5,387
  • 2
  • 17
  • 40
2
#get the quarters into a different column : 
df['Quarter'] = df.Date.dt.quarter
#get the groupings for the percent change from quarters 2 to 3 : 
pct_change = (df.query('Quarter in [2,3]')
              .groupby('Store')
              .Weekly_Sales
              .pct_change()
              .mul(100)
              .dropna()
             )
pct_change

2   -11.426342
6   -10.716535
Name: Weekly_Sales, dtype: float64

#get store number at third quarter:
store = df.loc[df['Quarter']==3,'Store']

2    1
6    2
Name: Store, dtype: int64

#merge the two objects
pd.concat([store,pct_change],axis=1)

    Store   Weekly_Sales
2   1   -11.426342
6   2   -10.716535

Another approach :

we know that the data is arranged for each store, and there are 4 rows per store, indicating the quarters ... quarters 2 and 3 will be on indices 1 and 2 for each grouped store :

filtered = (df
             #the nth function allows us to pick rows per group
            .groupby('Store').nth([1,2])
            .pivot(columns='Quarter',values='Weekly_Sales')
            .pct_change(axis=1)
            .mul(100)
            .dropna(axis=1)
            .rename(columns={3:'growth'})
           )

filtered

Quarter growth
Store   
1       -11.426342
2       -10.716535
sammywemmy
  • 27,093
  • 4
  • 17
  • 31
  • 1
    Interesting, I didn't know the `pct_change` function – Alexandre B. May 05 '20 at 09:49
  • https://stackoverflow.com/questions/61612351/pandas-unable-to-filter-rows-by-quarter-in-specific-year any help on this , guess the dataset is itself calculating the wrong quarters – Megan Fox May 05 '20 at 11:51