1

I have a three-month sales data set. I need to get the sales total count by week wise and group by an agent. and want to get daily standard division by the agent in sperate table

Agent District Agent_type  Date          Device
12    abc        br         01/02/2020    4233     
12    abc        br         01/02/2020    4123     
12    abc        br         03/02/2020    4314
12    abc        br         05/02/2020    4134
12    abc        br         19/02/2020    5341
12    abc        br         19/02/2020    52141
12    abc        br         19/02/2020    12141
12    abc        br         26/02/2020    4224
12    abc        br         28/02/2020    9563
12    abc        br         05/03/2020    0953
12    abc        br         10/03/2020    1212
12    abc        br         15/03/2020    4309
12    abc        br         02/03/2020    4200
12    abc        br         30/03/2020    4299
12    abc        br         01/04/2020    4211
12    abc        br         10/04/2020    2200
12    abc        br         19/04/2020    3300
12    abc        br         29/04/2020    3222
12    abc        br         29/04/2020    32222
12    abc        br         29/04/2020    4212
12    abc        br         29/04/2020    20922
12    abc        br         29/04/2020    67822
13    aaa        ae         15/02/2020    22222
13    aaa        ae         29/02/2020    42132
13    aaa        ae         10/02/2020    89022
13    aaa        ae         28/02/2020    31111
13    aaa        ae         28/02/2020    31132
13    aaa        ae         28/02/2020    31867
13    aaa        ae         14/02/2020    91122
output 
Agent District  Agent_type 1st_week_feb   2nd_week_feb   3rd_week_feb .....   4th_week_apr     
12      abc       br   count           count          count                 count
13      aaa       ae   count           count          count                 count

2nd output - daily std by agent

Agent  tot_sale   daily_std
12       22         2.40
13        7         1.34
Shulaz Shan
  • 175
  • 8

2 Answers2

3

You can use:

#convert values to datetimes
df['Date'] = pd.to_datetime(df['Date'], dayfirst=True)

#get weeks strating by 1 
week = df['Date'].dt.isocalendar().week 
df['Week'] = (week - week.min() + 1)
#lowercase months
df['Month'] = df['Date'].dt.strftime('%b').str.lower()
print (df)
    Agent       Date  Device  Week Month
0      12 2020-02-01    4233     1   feb
1      12 2020-02-01    4123     1   feb
2      12 2020-02-03    4314     2   feb
3      12 2020-02-05    4134     2   feb
4      12 2020-02-19    5341     4   feb
5      12 2020-02-26    4224     5   feb
6      12 2020-02-28    9563     5   feb
7      12 2020-03-05     953     6   mar
8      12 2020-03-10    1212     7   mar
9      12 2020-03-15    4309     7   mar
10     12 2020-03-02    4200     6   mar
11     12 2020-03-30    4299    10   mar
12     12 2020-04-01    4211    10   apr
13     12 2020-04-10    2200    11   apr
14     12 2020-04-19    3300    12   apr
15     12 2020-04-29    3222    14   apr
16     13 2020-02-15   22222     3   feb
17     13 2020-02-29   42132     5   feb
18     13 2020-03-10   89022     7   mar
19     13 2020-03-28   31111     9   mar
20     13 2020-04-14   91122    12   apr

#if need count rows use crosstab
df1 = pd.crosstab(df['Agent'], [df['Week'], df['Month']])
df1.columns = df1.columns.map(lambda x: f'{x[0]}_week_{x[1]}')
print (df1)
       1_week_feb  2_week_feb  3_week_feb  4_week_feb  5_week_feb  6_week_mar  \
Agent                                                                           
12              2           2           0           1           2           2   
13              0           0           1           0           1           0   

       7_week_mar  9_week_mar  10_week_apr  10_week_mar  11_week_apr  \
Agent                                                                  
12              2           0            1            1            1   
13              1           1            0            0            0   

       12_week_apr  14_week_apr  
Agent                            
12               1            1  
13               1            0  

#if need sum Device column use pivot_table 
df2 = df.pivot_table(index='Agent', 
                     columns=['Week', 'Month'], 
                     values='Device',
                     aggfunc='sum', 
                     fill_value=0)
df2.columns = df2.columns.map(lambda x: f'{x[0]}_week_{x[1]}')
print (df2)
       1_week_feb  2_week_feb  3_week_feb  4_week_feb  5_week_feb  6_week_mar  \
Agent                                                                           
12           8356        8448           0        5341       13787        5153   
13              0           0       22222           0       42132           0   

       7_week_mar  9_week_mar  10_week_apr  10_week_mar  11_week_apr  \
Agent                                                                  
12           5521           0         4211         4299         2200   
13          89022       31111            0            0            0   

       12_week_apr  14_week_apr  
Agent                            
12            3300         3222  
13           91122            0  

EDIT: Thank you @Henry Yik for pointed another way for count weeks by days:

df['Date'] = pd.to_datetime(df['Date'], dayfirst=True)
df['Week'] = (df["Date"].dt.day-1)//7+1
df['Month'] = df['Date'].dt.strftime('%b').str.lower()
print (df)
    Agent       Date  Device  Week Month
0      12 2020-02-01    4233     1   feb
1      12 2020-02-01    4123     1   feb
2      12 2020-02-03    4314     1   feb
3      12 2020-02-05    4134     1   feb
4      12 2020-02-19    5341     3   feb
5      12 2020-02-26    4224     4   feb
6      12 2020-02-28    9563     4   feb
7      12 2020-03-05     953     1   mar
8      12 2020-03-10    1212     2   mar
9      12 2020-03-15    4309     3   mar
10     12 2020-03-02    4200     1   mar
11     12 2020-03-30    4299     5   mar
12     12 2020-04-01    4211     1   apr
13     12 2020-04-10    2200     2   apr
14     12 2020-04-19    3300     3   apr
15     12 2020-04-29    3222     5   apr
16     13 2020-02-15   22222     3   feb
17     13 2020-02-29   42132     5   feb
18     13 2020-03-10   89022     2   mar
19     13 2020-03-28   31111     4   mar
20     13 2020-04-14   91122     2   apr
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • 1
    I think OP wants number of week per month, something like `(df["Date"].dt.day-1)//7+1`. – Henry Yik Oct 05 '20 at 06:23
  • @jezrael if i want to add another column called 'agent district' as well then how to change this code df1 = pd.crosstab(df['Agent'], [df['Week'], df['Month']]) df1.columns = df1.columns.map(lambda x: f'{x[0]}_week_{x[1]}') print (df1) – Shulaz Shan Oct 13 '20 at 11:55
  • @jezrael could you plz support – Shulaz Shan Oct 13 '20 at 12:36
  • 1
    @ShulazShan- @ShulazShan - I got it, change `df1 = pd.crosstab(df['Agent'], [df['Week'], df['Month']]) df1.columns = df1.columns.map(lambda x: f'{x[0]}_week_{x[1]}')` to `df1 = pd.crosstab([df['Agent'], df['District'], df['Agent_type']], [df['Week'], df['Month']]) df1.columns = df1.columns.map(lambda x: f'{x[0]}_week_{x[1]}') df = df.reset_index()` – jezrael Oct 13 '20 at 13:25
  • @jezrael, if i want to get the agent's daily sale stand division, how can code? i try this aug_sales.groupby("ACT_ACTDEVICE").apply(np.std) but not sure whether is correct or not? – Shulaz Shan Oct 14 '20 at 06:36
  • @ShulazShan - It seems correct, do you need `ddof=1` or `ddof=0` ? – jezrael Oct 14 '20 at 06:37
  • aug_sales.groupby("Agent").apply(np.std) sry not "ACT_ACTDEVICE". ddof mean? – Shulaz Shan Oct 14 '20 at 06:45
  • @ShulazShan - it is degree of freedom, `np.std` is by default `0` in pandas `std` is default 1 – jezrael Oct 14 '20 at 06:46
  • i edited my question above. how can i get the second table as i mention above – Shulaz Shan Oct 14 '20 at 06:55
  • @ShulazShan - do you think `aug_sales.groupby("ACT_ACTDEVICE").agg(['size', np.std]).rested_index()` ? – jezrael Oct 14 '20 at 06:57
  • how can i get "2nd output - daily std by agent" as i mentioned above. if run "aug_sales.groupby("ACT_ACTDEVICE").agg(['size', np.std]).rested_index()" this i am getting std for device colum as well – Shulaz Shan Oct 14 '20 at 07:04
  • 1
    Yop, `aug_sales.groupby("Agent")['ACT_ACTDEVICE'].agg(['size', np.std]).rested_index()` – jezrael Oct 14 '20 at 07:07
  • @jezrael, if i want to get mean as well, how to modify above code – Shulaz Shan Oct 14 '20 at 07:26
  • `aug_sales.groupby("Agent")['ACT_ACTDEVICE'].agg(['size', np.std, 'mean']).rested_index() ` – jezrael Oct 14 '20 at 07:30
  • @jezrael i think we need to consider the date column to get daily std to know? here we considering ACT_ACTDEVICE. i am trying to get each agent daily std – Shulaz Shan Oct 14 '20 at 07:41
  • @ShulazShan - Unfortnately not understand. – jezrael Oct 14 '20 at 08:49
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/223008/discussion-between-shulaz-shan-and-jezrael). – Shulaz Shan Oct 14 '20 at 08:55
1

Assuming that Date column has been converted to datetime, you can do your task in the following one-liner:

df.groupby(['Agent', pd.Grouper(key='Date', freq='W-MON', closed='left',
    label='left')]).count().unstack(level=1, fill_value=0)

For your data sample the result is:

          Device                                                         
Date  2020-01-27 2020-02-03 2020-02-17 2020-02-24 2020-03-02 2020-03-09  2020-03-30 2020-04-06 2020-04-13 2020-04-27 2020-02-10 2020-03-23
Agent                                                                                                                                      
12             2          2          1          2          2          2           2          1          1          1          0          0 
13             0          0          0          1          0          1           0          0          1          0          1          1 

​The column name is from the date of a Monday "opening" the week.

Valdi_Bo
  • 30,023
  • 4
  • 23
  • 41