python pandas dataframe with more header confusion

Question

EDIT: as @J Richard Snape pointed out - "Multiline header" is called "hierarchical index" in Pandas, which is solved in different question

I have two dataframes (there are corresponding indexes in each dataframe)

rep.head()

               zinc_ID
    Cluster No  
    3        ZINC08221075
    5        ZINC06920422
    7        ZINC06920381
    10       ZINC00968327
    4        ZINC08218951

ranked.head()

in case it is not obvious ranked dataframe has two 'headers' the uppermost is cons_all_rank spanning over 4 cells (like merged cell in excel). It is a result of calling groupby() and agg.([np.mean, np.std, np.median, len]).

           cons_all_rank                       
                    mean         std median len
Cluster No                                     
1              94.689655  106.622198   37.0  29
2              79.638889  101.176268   55.5  36
3             175.826087  133.990384  144.0  23

I performed

out = ranked.join(rep)
out[:3]

and received

    (cons_all_rank, mean)   (cons_all_rank, std)    (cons_all_rank, median)     (cons_all_rank, len)    zinc_ID
Cluster No                  
1   94.689655   106.622198  37.0    29  ZINC08034993
2   79.638889   101.176268  55.5    36  ZINC32840901
3   175.826087  133.990384  144.0   23  ZINC08221075

Next I wanted to sort the out dataframe by performing

out.sort(columns='(cons_all_rank, median)')

but I keep getting

KeyError: '(cons_all_rank, median)'

so I tried out.sort(columns='zinc_ID') and it worked flawlessly. I also tried

out.sort('cons_all_rank, median')
out.sort('median')

but neither is working. Is there a problem with the multiline header in ranked dataframe? If so, how can I get rid of it? ranked.drop('cons_all_rank', axis=0, inplace=True) did not work.

Is this what you want? stackoverflow.com/questions/14507794/python-pandas-how-to-flatten-a-hierarchical-index-in-columns — J Richard Snape, Mar 30 '15 at 17:05
exactly! I didn't know what to look for. Now I know that "Multiline header" = "hierarchical index". Will you post an answer so others can easily see or shall I edit? — redacted, Mar 30 '15 at 17:28
It's probably best to edit the question to highlight that what you were calling a "multiline header" is known in pandas as a "hierarchical index" and then mark it as a duplicate. It's not in the spirit of Stack overflow to add an answer here, as the other one is exactly what you want, comprehensive and accurate as far as I can see. Glad I could help you :) — J Richard Snape, Mar 30 '15 at 19:46
@camdenl Yes, of course, just giving OP time to see comment / learn ropes round here — J Richard Snape, Mar 30 '15 at 22:47

python pandas dataframe with more header confusion

0 Answers0