EDIT: as @J Richard Snape pointed out - "Multiline header" is called "hierarchical index" in Pandas, which is solved in different question
I have two dataframes (there are corresponding indexes in each dataframe)
rep.head()
zinc_ID
Cluster No
3 ZINC08221075
5 ZINC06920422
7 ZINC06920381
10 ZINC00968327
4 ZINC08218951
ranked.head()
in case it is not obvious ranked dataframe has two 'headers' the uppermost is
cons_all_rank
spanning over 4 cells (like merged cell in excel). It is a result of calling groupby()
and agg.([np.mean, np.std, np.median, len])
.
cons_all_rank
mean std median len
Cluster No
1 94.689655 106.622198 37.0 29
2 79.638889 101.176268 55.5 36
3 175.826087 133.990384 144.0 23
I performed
out = ranked.join(rep)
out[:3]
and received
(cons_all_rank, mean) (cons_all_rank, std) (cons_all_rank, median) (cons_all_rank, len) zinc_ID
Cluster No
1 94.689655 106.622198 37.0 29 ZINC08034993
2 79.638889 101.176268 55.5 36 ZINC32840901
3 175.826087 133.990384 144.0 23 ZINC08221075
Next I wanted to sort the out
dataframe by performing
out.sort(columns='(cons_all_rank, median)')
but I keep getting
KeyError: '(cons_all_rank, median)'
so I tried out.sort(columns='zinc_ID')
and it worked flawlessly.
I also tried
out.sort('cons_all_rank, median')
out.sort('median')
but neither is working.
Is there a problem with the multiline header in ranked
dataframe? If so, how can I get rid of it? ranked.drop('cons_all_rank', axis=0, inplace=True)
did not work.