1

I have a MultiIndex dataframe with the top level columns named:

Col1_1 | Col1_2 | Col 2_1 | Col 2_2 | ... |

I'm looking to combine Col1_1 with Col1_2 as Col1. I could also do this before creating the MultiIndex, but the original data is more drawn out as:

Col1_1.aspect1 | Col1_1.aspect 2 | Col1_2.aspect1 | Col1_2.aspect2 | ... |

where 'aspect1' and 'aspect2' become subcolumns in the MultiIndex.

Please let me know if I can clarify anything, and many thanks in advance.

Current df

The expected result combines the two as just Sample1; any number of ways is fine, including stacking/concatenating the data, outputting a summary stat e.g. mean(), etc.

metaditch
  • 63
  • 1
  • 7
  • share df.head() – Zeugma Jan 26 '17 at 17:18
  • I've previously found similar questions, e.g. http://stackoverflow.com/questions/41221079/rename-multiindex-columns-in-pandas , but I don't believe this is quite right for this problem. – metaditch Jan 26 '17 at 17:19
  • 1
    again, share a sample and share an example of your expected result – Zeugma Jan 26 '17 at 17:20
  • :) we commented ~simultaneously as you can see by the time stamps, I didn't ignore your share request. I've uploaded a snip of the df (it contains hundreds of cols, thousands of rows). Many outputs would work here, as noted above. Thanks. – metaditch Jan 26 '17 at 17:33
  • thanks, so basically what does it mean with your actual columns? Are you trying to like merge gtype, score etc. columns from sample11 and sample12 in one unique column? or something else – Zeugma Jan 26 '17 at 17:36
  • I'm trying to merge Gtype from Sample1_1 and Sample1_2 into a single sub-column, Gtype under Sample1. Ditto for the other second level columns, Score, Theta, etc. It would also work fine to just output the mean of all values in a new table (GType doesn't matter to me in this case so nan is fine, but the column with numbers do). Thanks for the assist; I hope I'm not overly complicating this question. – metaditch Jan 26 '17 at 17:41

1 Answers1

2

You can use groupby and apply an aggregation function against it like mean. You must group against axis 1 (columns) and with level 1 (lower multiindex columns). It will apply the grouping across all samples. Then simply do a mean if it's what you want to achieve:

df.groupby(level=1, axis=1).mean()
Zeugma
  • 31,231
  • 9
  • 69
  • 81
  • Thanks! Knowing 'levels' is really helpful for a Python newbie like myself. This is a great start, though I receive: 'DataError: No numeric types to aggregate'. I'll see if I can find the issue then add it here and mark accepted to close it out. – metaditch Jan 26 '17 at 18:13
  • That s likely because of your text columns: filter them out from the dataframe before grouping – Zeugma Jan 26 '17 at 20:01