0

I have a dataset like below-

  Store   Date     Weekly_Sales         
0   1   2010-05-02  1643690.90  
1   1   2010-12-02  1641957.44  
2   1   2010-02-19  1611968.17  
3   1   2010-02-26  1409727.59  
4   1   2010-05-03  1554806.68

It has 100 stores in all. I want to filter the data of the year 2012 by Quarter

# Filter out only the data in 2012 from the dataset

import datetime as dt
df['Date'] = pd.to_datetime(df['Date'])
ds_2012 = df[df['Date'].dt.year == 2012]

# Calculate Q on the dataset
ds_2012 = ds_2012.sort_values(['Date'],ascending=True)
quarterly_sales = ds_2012.groupby(['Store', pd.Grouper(key='Date', freq='Q')])['Weekly_Sales'].sum()
quarterly_sales.head(20)

Output Received

Store     Date      
1      2012-03-31    18951097.69
       2012-06-30    21036965.58
       2012-09-30    18633209.98
       2012-12-31     9580784.77

The Summation of of Q2(2012-06-30) and Q3(2012-09-30) both are incorrect when filtered in excel. I am a newbie to Pandas

Megan Fox
  • 435
  • 2
  • 6
  • 20
  • Can you explain your code and provide an answer? Its not working though you have didn't group by Store – Megan Fox May 05 '20 at 11:48
  • kindly share the source data as well – sammywemmy May 05 '20 at 11:57
  • I have shared the picture above of the source data. Do you want me to share the excel file? – Megan Fox May 05 '20 at 12:06
  • no, the data itself, just a couple of rows. u could do ```df.to_dict('records')``` and share that as well. this [link](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) should guide u on how to go about it – sammywemmy May 05 '20 at 12:08
  • also, kindly share what ur expected output should be. data as well, not pics – sammywemmy May 05 '20 at 12:08
  • @sammywemmy I want the same output store wise and sorted with date from asc but its just that the sum is not accurately calculated when i am filtering in excel. – Megan Fox May 05 '20 at 12:17
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/213172/discussion-between-megan-fox-and-sammywemmy). – Megan Fox May 05 '20 at 13:30

1 Answers1

0

You can groupby store and resample the DataFrame quarterly:

import pandas as pd
df=pd.concat([pd.DataFrame({'Store':[i]*12, 'Date':pd.date_range(start='2020-01-01', periods=12, freq='M'), 'Sales':list(range(12))}) for i in [1,2]])
df.groupby('Store').resample('Q', on='Date').sum().drop('Store', axis=1)

                  Sales
Store Date             
1     2020-03-31      3
      2020-06-30     12
      2020-09-30     21
      2020-12-31     30
2     2020-03-31      3
      2020-06-30     12
      2020-09-30     21
      2020-12-31     30

Maybe check the groupby and resample docs aswell.

luigigi
  • 4,146
  • 1
  • 13
  • 30
  • I am not getting the output as you have mentioned by the solution , kindly use my dataset and provide the solution . I m getting the same output as you have shown but i need the Weekly Sales to be summed up . – Megan Fox May 06 '20 at 07:14
  • please provide an expected output. its totally not clear what you want to achieve. weekly sales can be summed up like this. and it would be good to extend your sample data – luigigi May 06 '20 at 07:20
  • the output you have provided is correct but i m getting 3,12,21,30 as well instead of the store sales sum which i provided above . i am a newbie so i am not getting how to do it with your solution – Megan Fox May 06 '20 at 07:34
  • Did you get what i am looking for? – Megan Fox May 06 '20 at 08:07
  • simply use your dataframe and not the one i created as an example – luigigi May 06 '20 at 08:09
  • Can you rewrite the example with my dataframe , assume its data_2012 – Megan Fox May 06 '20 at 09:42