1

I have created a pivot table with a three-level multi-index (Group, Product, and State). The state level is automatically sorted alphabetically, but I need to change the ordering to a custom, non-alphabetical ordering. I may need to also re-order the Group and Product levels in similar fashion.

pivot = data.pivot_table(rows=['Group', 'Product', 'State'], 
                     values = ['JAN', 'FEB', 'MAR', 'APR', 'MAY', 
                               'JUN', 'JUL', 'AUG', 'SEP', 'OCT', 
                               'NOV', 'DEC'], fill_value=0, margins=True aggfunc=sum) 

cols = ['JAN', 'FEB', 'MAR', 'APR', 'MAY', 'JUN', 'JUL', 'AUG', 'SEP', 'OCT', 'NOV', 'DEC']

#I used this method to prevent the month names from being alphabetically sorted 
pivot = pivot[cols]

A portion of the pivot table looks like this...

                                 JUN     JUL    AUG
Group    Product     State   
Group A  Product A   AZ          0       0      0
                     CO          0       0      0
                     GA          0       0      0
                     IL          0       0      0
                     IN          0       0      0
                     KS          0       0      0
                     MN          0       0      0
                     MO          0       0      0

I need the ordering of the states to be as follows...

state_order = ['AZ','CO','ID','NV','OR','UT','WA','IA','KS','MN','MO','NE','ND','SD','GA','IL','IN','OH','WI']

I tried the reindex_axis() fuction, feeding in my list above and specifying level=2. However, the states were still sorted in alphabetical order.

Any insights into a fix would be much appreciated.

marshackVB
  • 43
  • 1
  • 5
  • might be a duplicate of http://stackoverflow.com/questions/26707171/sort-pandas-dataframe-based-on-list – aensm Feb 24 '15 at 16:14
  • The method in the post above worked very well for reordering and displaying observations in my DataFrame; however, when I created a pivot table from the DataFrame, the ordering is changed. The states, which were sorted property in my original DataFrame, are resorted into alphabetical order. – marshackVB Feb 25 '15 at 16:16
  • I believe I need a way to specifically re-order the pivot table's multi-index level 2 (state names) by providing a list, though I have tried and failed to accomplish this. – marshackVB Feb 25 '15 at 16:27

2 Answers2

4

You can try changing the data type of State to category.

data["state"] = data["state"].astype("category")

then set the sort order

data["state"].cat.set_categories(['AZ','CO','ID','NV','OR','UT','WA','IA','KS',
                  'MN','MO','NE','ND','SD','GA','IL','IN','OH','WI'],inplace=True)

EDIT: fyi, the category dtype is relatively new. 0.15.0 i believe

Bob Haffner
  • 8,235
  • 1
  • 36
  • 43
  • Thank you for your reply. I am still unable to prevent the pivot table mutli-index level 2, which represent the state names, from resorting alphabetically. The DataFrame that the pivot table is based on is reorganized properly by state. Once I create the pivot table, the ordering is changed. It seems that I need to specifically reorder the pivot table multi-index, rather than the DataFrame data it is based on. – marshackVB Feb 25 '15 at 16:22
  • What happens when you reset the indexes of pivot? Does it sort properly? – Bob Haffner Feb 25 '15 at 16:34
  • I tried pivot.reset_index() but the states are still sorted alphabetically, as they were in the original table. I also tried creating a pivot table using groupby() and unstack(). Still, the states were reordered alphabetically, even though in the original DataFrame, the states were classified as categorical and sorted in the order of the list I provided above. – marshackVB Feb 28 '15 at 18:12
  • I would appreciate any additional suggestions. Thanks for your help. – marshackVB Feb 28 '15 at 18:13
  • Sorry, I don't have any further ideas. You might try asking your question again with a different title to see if anyone else can help. – Bob Haffner Mar 03 '15 at 15:17
  • Relevant bug report here: https://github.com/pandas-dev/pandas/issues/15105 – naught101 Jan 18 '17 at 22:57
0

I got bitten by a similar problem, I will post a sample solution. You can use a dummy variable and groupby.

import pandas as pd 
import numpy as np 

index = pd.MultiIndex.from_tuples(zip(['a', 'a', 'a', 'b', 'b', 'b'],
                                      [0, 0, 0, 1, 1, 1],
                                      ['x', 'xx', 'xxx', 'x', 'xx', 'xxx']),
                                      names=['A', 'B', 'C'])
df = pd.DataFrame(np.random.rand(6, 3), index = index)
>>> df
                0         1         2
A B C                                
a 0 x    0.839870  0.763803  0.847632
    xx   0.619066  0.715492  0.467518
    xxx  0.917468  0.923521  0.278665
b 1 x    0.660889  0.209247  0.502107
    xx   0.069925  0.889308  0.836755
    xxx  0.967187  0.650482  0.138759



desired_order = ['xxx', 'xx', 'x']
df = df.reset_index(2)
mapping = { _ : desired_order.index(_) for _ in df['C'] }

df['Dummy'] = df['C'].map(lambda x: mapping[x]) #gives desired order
df = df.groupby(level=['A', 'B']).apply(lambda x: x.sort('Dummy'))
df.drop('Dummy', axis=1)


>>> df 
       C         0         1         2
A B                                   
a 0  xxx  0.273731  0.561262  0.970034
  0   xx  0.859063  0.459765  0.921475
  0    x  0.640554  0.045410  0.512320
b 1  xxx  0.678473  0.380712  0.252676
  1   xx  0.501426  0.577250  0.317702
  1    x  0.586227  0.927453  0.794912

There may be a better way using Categorical variables introduced in pandas 0.15, but I dont know a easier solution.

Alexander
  • 105,104
  • 32
  • 201
  • 196
Gecko
  • 1,379
  • 11
  • 14