2
data = {'machine_id': [1000,1000,3000,2000,3000,1000,1000,3000,2000,3000,1000,1000,3000,2000,3000,1000,2000], 
    'item_id': [100,100,100,200,300,100,100,100,200,300,100,100,100,200,300,100,200],
    'Date': ['2022-03-01','2022-03-02','2022-03-03','2022-03-04','2022-03-05',
    '2022-04-01','2022-04-02','2022-04-03','2022-04-04','2022-04-05',
    '2022-05-01','2022-05-02','2022-05-03','2022-05-04','2022-05-05','2022-03-06','2022-03-10'],
    'Sales_qty':[5,6,5,7,8,1,3,7,9,6,4,7,8,9,4,5,8]}
df = pd.DataFrame(data)
    df['total']=df.groupby(['machine_id','item_id','Date'])['Sales_qty'].transform('sum')
    df['sales_avg']=df['Sales_qty']/df['total']

in output column i want last month sales avg.

|machine_id|item_id|Date|Sales_qty|total|current_sales_avg| last_month_avg|
0   1000    100 2022-03-01  5   13  0.384615    NaN
1   1000    100 2022-03-01  6   13  0.461538    NaN
2   3000    100 2022-03-01  5   5   1.000000    NaN
3   2000    200 2022-03-01  7   10  0.700000    NaN
4   3000    300 2022-03-01  8   8   1.000000    NaN
5   1000    100 2022-04-01  1   4   0.250000    0.384615
6   1000    100 2022-04-01  3   4   0.750000    0.461538
7   3000    100 2022-04-01  7   7   1.000000    1.000000
8   2000    200 2022-04-01  9   9   1.000000    0.700000
9   3000    300 2022-04-01  6   6   1.000000    1.000000
10  1000    100 2022-05-01  4   11  0.363636    0.250000
11  1000    100 2022-05-01  7   11  0.636364    0.750000
12  3000    100 2022-05-01  8   8   1.000000    1.000000
13  2000    200 2022-05-01  9   9   1.000000    1.000000
14  3000    300 2022-05-01  4   4   1.000000    1.000000
15  1000    100 2022-03-06  2   13  0.153846    Nan
16  2000    200 2022-03-10  3   10  0.300000    Nan

Need code get the output. For March month the sales avg will be nan because we dont have feb month data. For april month we have last month data that is of march and for may month we have april month data.

  • related: [How to shift entire groups in pandas groupby](https://stackoverflow.com/questions/27305135/how-to-shift-entire-groups-in-pandas-groupby) – Vladimir Fokow Aug 18 '22 at 08:10

2 Answers2

2

Between different month groups - the values of the same day will be aligned with each other - regardless of the differences in their number and order.

Requirement: days must be unique within each month.

I am using my approach from this question: Can .apply use information from other groups?.

df['Date'] = pd.to_datetime(df['Date'])
temp = df[['Date', 'sales_avg']]  # temporary df

# Set index as: day of the month
# (it will be used to align the values different month groups by day)
temp.index = temp['Date'].dt.day

grouped = temp.groupby(temp['Date'].dt.to_period('M'))['sales_avg']

group_keys = list(grouped.groups.keys())  # group keys are: year_&_month
first_group_key = group_keys[0]
prev_keys = iter(group_keys)

def get_prev_avg(ser):
    if ser.name == first_group_key:
        return np.nan
    return grouped.get_group(next(prev_keys)).reindex(ser.index)

prev_avg = grouped.transform(get_prev_avg)  # new Series (with index: day)
prev_avg.index = df.index                   # now with the original index
df['prev_avg'] = prev_avg

With your current data:

print(df)

    machine_id  item_id       Date  Sales_qty  total  sales_avg  prev_avg
0         1000      100 2022-03-01          5      5        1.0       NaN
1         1000      100 2022-03-02          6      6        1.0       NaN
2         3000      100 2022-03-03          5      5        1.0       NaN
3         2000      200 2022-03-04          7      7        1.0       NaN
4         3000      300 2022-03-05          8      8        1.0       NaN
5         1000      100 2022-04-01          1      1        1.0       1.0
6         1000      100 2022-04-02          3      3        1.0       1.0
7         3000      100 2022-04-03          7      7        1.0       1.0
8         2000      200 2022-04-04          9      9        1.0       1.0
9         3000      300 2022-04-05          6      6        1.0       1.0
10        1000      100 2022-05-01          4      4        1.0       1.0
11        1000      100 2022-05-02          7      7        1.0       1.0
12        3000      100 2022-05-03          8      8        1.0       1.0
13        2000      200 2022-05-04          9      9        1.0       1.0
14        3000      300 2022-05-05          4      4        1.0       1.0
15        1000      100 2022-03-06          5      5        1.0       NaN
16        2000      200 2022-03-10          8      8        1.0       NaN

Vladimir Fokow
  • 3,728
  • 2
  • 5
  • 27
  • If i will chage the date column and i want the result on groupby machine_id and item_id. | Date | 0 2022-03-01 1 2022-03-02 2 2022-03-03 3 2022-03-04 4 2022-03-05 5 2022-04-01 6 2022-04-02 7 2022-04-03 8 2022-04-04 9 2022-04-05 10 2022-05-01 11 2022-05-02 12 2022-05-03 13 2022-05-04 14 2022-05-05 – sachin Dhulshette Aug 18 '22 at 05:00
  • not only shift the rows.. for particular month i want the previous month sales average. If iam taking march month then there should previous month that feb month average. for april month the previous month is march avr..likeways i want to find out the previous month sales average for the particual month by grouping the machine and item id. – sachin Dhulshette Aug 18 '22 at 07:27
  • @sachinDhulshette, in that case, please provide the expected output you want. Right now the expected output you've provided doesn't align with your requirement – Vladimir Fokow Aug 18 '22 at 08:15
  • The expected output is same as you have done.. but only change is date column. That is dates are 2022-03-01....2022-03-02..2022-03-03.... 2022-04-01..2022-04-02..2022-04-03.. likewise – sachin Dhulshette Aug 18 '22 at 08:59
  • i have changed my dataframe and output also..plzz check . could u help me to write code for the updated dataframe. – sachin Dhulshette Aug 18 '22 at 15:43
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/247380/discussion-between-sachin-dhulshette-and-vladimir-fokow). – sachin Dhulshette Aug 19 '22 at 05:06
1

You can first agg the groupby Date as list and then explode. But before that capture the index which is the first group's index so that later you can fill that with np.nan

df2 = df.groupby('Date')['sales_avg'].agg(list).shift().to_frame()
na_index = df2['sales_avg'].isna().index[0]  
df2 = df2.bfill().explode('sales_avg')
df2.loc[na_index] = np.nan
df['sales_avg'] = df2['sales_avg']

print(df):

            machine_id  item_id  Sales_qty  total sales_avg
Date                                                       
2022-03-01        1000      100          5     31       NaN
2022-03-01        1000      100          6     31       NaN
2022-03-01        3000      100          5     31       NaN
2022-03-01        2000      200          7     31       NaN
2022-03-01        3000      300          8     31       NaN
2022-04-01        1000      100          1     26   0.16129
2022-04-01        1000      100          3     26  0.193548
2022-04-01        3000      100          7     26   0.16129
2022-04-01        2000      200          9     26  0.225806
2022-04-01        3000      300          6     26  0.258065
2022-05-01        1000      100          4     32  0.038462
2022-05-01        1000      100          7     32  0.115385
2022-05-01       30000      100          8     32  0.269231
2022-05-01        2000      200          9     32  0.346154
2022-05-01        3000      300          4     32  0.230769
SomeDude
  • 13,876
  • 5
  • 21
  • 44
  • If i will chage the date column and i want the result on groupby machine_id and item_id. | Date | 0 2022-03-01 1 2022-03-02 2 2022-03-03 3 2022-03-04 4 2022-03-05 5 2022-04-01 6 2022-04-02 7 2022-04-03 8 2022-04-04 9 2022-04-05 10 2022-05-01 11 2022-05-02 12 2022-05-03 13 2022-05-04 14 2022-05-05 – sachin Dhulshette Aug 18 '22 at 05:02
  • not only shift the rows.. for particular month i want the previous month sales average. If iam taking march month then there should previous month that feb month average. for april month the previous month is march avr..likeways i want to find out the previous month sales average for the particual month by grouping the machine and item id. – sachin Dhulshette Aug 18 '22 at 07:25
  • i have changed my dataframe and output also..plzz check . could u help me to write code for the updated dataframe – sachin Dhulshette Aug 18 '22 at 15:44
  • I m not able to create df from your input. please check your input. – SomeDude Aug 18 '22 at 16:00
  • I have changed dataframe. resolve value error issue. – sachin Dhulshette Aug 19 '22 at 05:08