4

My dataframe contains quarterly data and for some companies monthly data as well.

import pandas as pd
df = pd.DataFrame({'quarter': ['2010-1', '2010-2', '2010-3','2010-4', '2011-1'],
                  'volume_quarter': [450, 450, 450, 450, 450],
                  'volume_month_1': [150, 150, 150, 150, 150],
                  'volume_month_2': [160, 160, 160, 160, 160],
                  'volume_month_3': [140, 140, 140, 140, 140]})
df

Gives:

quarter volume_quarter  volume_month_1  volume_month_2  volume_month_3
2010-1  450               150            160               140
2010-2  450               150            160               140
2010-3  450               150            160               140
2010-4  450               150            160               140
2011-1  450               150            160               140

With the following code:

pd.melt(df, id_vars = ['quarter'], value_vars=['volume_month_1', "volume_month_2", "volume_month_3"])

I get:

    quarter variable    value
0   2010-1  volume_month_1  150
1   2010-2  volume_month_1  150
2   2010-3  volume_month_1  150
3   2010-4  volume_month_1  150
4   2011-1  volume_month_1  150
5   2010-1  volume_month_2  160
6   2010-2  volume_month_2  160
7   2010-3  volume_month_2  160
8   2010-4  volume_month_2  160
9   2011-1  volume_month_2  160
10  2010-1  volume_month_3  140
11  2010-2  volume_month_3  140
12  2010-3  volume_month_3  140
13  2010-4  volume_month_3  140
14  2011-1  volume_month_3  140

Instead I'm trying to achieve the following:


    quarter variable        value
0   2010-1  volume_month_1  150
1   2010-1  volume_month_2  160
2   2010-1  volume_month_3  140
3   2010-2  volume_month_1  150
4   2010-2  volume_month_2  160
5   2010-2  volume_month_3  140
6   2010-3  volume_month_1  150
7   2010-3  volume_month_2  160
8   2010-3  volume_month_3  140
9   2010-4  volume_month_1  150
10  2010-4  volume_month_2  160
11  2010-4  volume_month_3  140
12  2011-1  volume_month_1  150
13  2011-1  volume_month_2  160
14  2011-1  volume_month_3  140

I'd like to achieve this, so I can run the Arima model on the montly values.

Million thanks in advance !

Tldr
  • 177
  • 1
  • 11
  • Welcome to stackoverflow. Unfortunately we cannot copy your picture. Please add some example data which we can copy. You can find a good explanation [here:](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) – Erfan Apr 12 '19 at 19:42
  • thanks! Hope it's more clear this way – Tldr Apr 12 '19 at 23:11
  • 1
    Can you describe in words what's wrong with the result you have now, and what's better about the expected result you want? – John Zwinck Apr 12 '19 at 23:16
  • Hi John, I made another mistake in my edit. I adjusted that – Tldr Apr 13 '19 at 10:04

1 Answers1

1

You only missed sorting, this line of code:

df = (
    pd.melt(
        df,
        id_vars=["quarter"],
        value_vars=["volume_month_1", "volume_month_2", "volume_month_3"],
    )
    .sort_values(by="quarter")
    .reset_index(drop=True)
)

returns as you desired:

   quarter        variable  value
0   2010-1  volume_month_1    150
1   2010-1  volume_month_2    160
2   2010-1  volume_month_3    140
3   2010-2  volume_month_1    150
4   2010-2  volume_month_2    160
5   2010-2  volume_month_3    140
6   2010-3  volume_month_1    150
7   2010-3  volume_month_2    160
8   2010-3  volume_month_3    140
9   2010-4  volume_month_1    150
10  2010-4  volume_month_2    160
11  2010-4  volume_month_3    140
12  2011-1  volume_month_1    150
13  2011-1  volume_month_2    160
14  2011-1  volume_month_3    140
Szymon Maszke
  • 22,747
  • 4
  • 43
  • 83