0

I have a dataset of weekly sales of a few stores, which looks something similar to :

store_id item_id week sales
store1 item1 2021-01 3
store1 item2 2021-01 2
store2 item1 2021-01 10
store2 item3 2021-01 1
store1 item1 2021-02 5
store1 item2 2021-02 1
store2 item1 2021-02 11
store1 item3 2021-03 6
store1 item1 2021-04 7
store2 item3 2021-04 2

I did a pivot on this data frame like :
df.groupby(['store_id', 'item_id', 'week'])['sales'].mean().unstack() which generates a new dataframe like the one below :

store_id item_id 2021-01 2021-02 2021-03 2021-04
store1 item1 3 5 0 7
store1 item2 2 1 0 0
store1 item3 0 0 6 0
store2 item1 10 11 0 0
store2 item3 1 0 0 2

I have performed the needed modifications to this new dataframe.
How do we get back the original dataframe structure from this modified pivot dataframe. such that I can represent each week number as a separate row rather than columns as in this dataframe.

Edit : I tried using melt as an example from Convert columns into rows with Pandas:

df.melt(id_vars=["store_id", "item_id"], 
        var_name="week", 
        value_name="sales")

But I get the error : The following 'id_vars' are not present in the DataFrame: ['store_id', 'item_id']. This seems probably because my df has multi-indices and not in serialised form.

Output of df.index

MultiIndex([('store1', 'item1'),
            ('store1', 'item2'),
           ....
           ....
           names=['store_id', 'item_id'], length=15305)

1 Answers1

1

You can use df.melt. Using the below dataframe:

df = pd.DataFrame({'store_id' : ['store1', 'store1', 'store1', 'store2', 'store2'],
                  'item_id' : ['item1', 'item2', 'item3', 'item1', 'item3'],
                  '2021-01' : [3, 2, 0, 10, 1],
                  '2021-02' : [5, 1, 0, 11, 0],
                  '2021-03' : [0, 0, 6, 0, 0],
                  '2021-04' : [7, 0, 0, 0, 2]})

  store_id item_id  2021-01  2021-02  2021-03  2021-04
0   store1   item1        3        5        0        7
1   store1   item2        2        1        0        0
2   store1   item3        0        0        6        0
3   store2   item1       10       11        0        0
4   store2   item3        1        0        0        2

You can then melt, using store and item id as id_vars, and values are the date columns:

df.melt(id_vars=['store_id', 'item_id'], var_name='week', value_name='sales')

This will give the below:

   store_id item_id     week  sales
0    store1   item1  2021-01      3
1    store1   item2  2021-01      2
2    store1   item3  2021-01      0
3    store2   item1  2021-01     10
4    store2   item3  2021-01      1
5    store1   item1  2021-02      5
6    store1   item2  2021-02      1
7    store1   item3  2021-02      0
8    store2   item1  2021-02     11
9    store2   item3  2021-02      0
10   store1   item1  2021-03      0
11   store1   item2  2021-03      0
12   store1   item3  2021-03      6
13   store2   item1  2021-03      0
14   store2   item3  2021-03      0
15   store1   item1  2021-04      7
16   store1   item2  2021-04      0
17   store1   item3  2021-04      0
18   store2   item1  2021-04      0
19   store2   item3  2021-04      2
Emi OB
  • 2,814
  • 3
  • 13
  • 29