Combining MultiIndex columns with similar root names in Pandas/Python

Question

I have a MultiIndex dataframe with the top level columns named:

Col1_1 | Col1_2 | Col 2_1 | Col 2_2 | ... |

I'm looking to combine Col1_1 with Col1_2 as Col1. I could also do this before creating the MultiIndex, but the original data is more drawn out as:

Col1_1.aspect1 | Col1_1.aspect 2 | Col1_2.aspect1 | Col1_2.aspect2 | ... |

where 'aspect1' and 'aspect2' become subcolumns in the MultiIndex.

Please let me know if I can clarify anything, and many thanks in advance.

The expected result combines the two as just Sample1; any number of ways is fine, including stacking/concatenating the data, outputting a summary stat e.g. mean(), etc.

I've previously found similar questions, e.g. http://stackoverflow.com/questions/41221079/rename-multiindex-columns-in-pandas , but I don't believe this is quite right for this problem. — metaditch, Jan 26 '17 at 17:19
again, share a sample and share an example of your expected result — Zeugma, Jan 26 '17 at 17:20
:) we commented ~simultaneously as you can see by the time stamps, I didn't ignore your share request. I've uploaded a snip of the df (it contains hundreds of cols, thousands of rows). Many outputs would work here, as noted above. Thanks. — metaditch, Jan 26 '17 at 17:33
thanks, so basically what does it mean with your actual columns? Are you trying to like merge gtype, score etc. columns from sample11 and sample12 in one unique column? or something else — Zeugma, Jan 26 '17 at 17:36
I'm trying to merge Gtype from Sample1_1 and Sample1_2 into a single sub-column, Gtype under Sample1. Ditto for the other second level columns, Score, Theta, etc. It would also work fine to just output the mean of all values in a new table (GType doesn't matter to me in this case so nan is fine, but the column with numbers do). Thanks for the assist; I hope I'm not overly complicating this question. — metaditch, Jan 26 '17 at 17:41

score 2 · Answer 1 · answered Jan 26 '17 at 17:50

2

You can use groupby and apply an aggregation function against it like mean. You must group against axis 1 (columns) and with level 1 (lower multiindex columns). It will apply the grouping across all samples. Then simply do a mean if it's what you want to achieve:

df.groupby(level=1, axis=1).mean()

answered Jan 26 '17 at 17:50

Zeugma

31,231
9
69
81

Thanks! Knowing 'levels' is really helpful for a Python newbie like myself. This is a great start, though I receive: 'DataError: No numeric types to aggregate'. I'll see if I can find the issue then add it here and mark accepted to close it out. – metaditch Jan 26 '17 at 18:13
That s likely because of your text columns: filter them out from the dataframe before grouping – Zeugma Jan 26 '17 at 20:01

Combining MultiIndex columns with similar root names in Pandas/Python

1 Answers1