1

I have a dataset made up of 4 columns, a numerator denominator, country, and month. I am pivoting it to get months as columns, country as index, and values as sum(numerator)/sum(denominator). The only problem I get is that my columns are all out of order. How can I sort the columns so earlier months appear first? I tried table = table.sort_index(1) with no luck.

table = pd.pivot_table(df, values=['Numerator', 'Denominator'], index='Country',
                columns=['Month'], aggfunc=np.sum)

table = table['Numerator'] / table['Denominator']

Edit with full example and data:

Data:

Denominator,Numerator,Country,Month
10,4,USA,1-Jan
6,2,USA,1-Jan
10,1,Canada,1-Jan
9,2,Canada,1-Jan
6,4,Canada,1-Feb
4,3,Canada,1-Feb

Code:

import pandas as pd
import numpy as np

df = pd.read_csv('data.csv')

table = pd.pivot_table(df, values=['Numerator', 'Denominator'], index='Country',
                columns=['Month'], aggfunc=np.sum)
table = table['Numerator'] / table['Denominator']
print table

Output:

Month    1-Feb     1-Jan
Country                 
Canada     0.7  0.157895
USA        NaN  0.37500

Desired Output:

Month    1-Jan     1-Feb
Country                 
Canada     0.157895  0.7
USA        0.37500   NaN
user2242044
  • 8,803
  • 25
  • 97
  • 164

1 Answers1

1

Option 1
Impose sorting order for pivot, before pivot
This option works because pivot automatically sorts index and column values and displays them. Currently, Month is a string, so sorting will be done lexicographically. You can change this by a datetime conversion.

df.Month = (pd.to_datetime(df.Month, format='%d-%b'))

table = pd.pivot_table(
   df, 
   values=['Numerator', 'Denominator'], 
   index='Country',
   columns=['Month'], 
   aggfunc=np.sum
)
table = table['Numerator'] / table['Denominator']

table.columns = table.columns.strftime('%d-%b')

table

           01-Jan  01-Feb
Country                  
Canada   0.157895     0.7
USA      0.375000     NaN

Option 2
Reorder after pivot
If your data is stored in chronological order, you can just find df.Month.unique and use it to reindex your result.

table.reindex(columns=df.Month.unique())

Month       1-Jan  1-Feb
Country                 
Canada   0.157895    0.7
USA      0.375000    NaN

If that isn't the case (and your data isn't chronologically ordered), here's a little workaround using pd.to_datetime + pd.Series.argsort + unique.

u = df.Month.iloc[
       pd.to_datetime(df.Month, format='%d-%b').argsort()
].unique()

table.reindex(columns=u)

Month       1-Jan  1-Feb
Country                 
Canada   0.157895    0.7
USA      0.375000    NaN
cs95
  • 379,657
  • 97
  • 704
  • 746