18

I'm starting with a dictionary like this:

dict = {(100000550L, u'ActivityA'): {'bar__sum': 14.0, 'foo__sum': 12.0},
        (100001799L, u'ActivityB'): {'bar__sum': 7.0, 'foo__sum': 3.0}}

Which, when converted to a DataFrame, puts as column headers the tuples of (id, activitytype):

df = DataFrame(dict).transpose()

                        bar__sum  foo__sum
(100000550, ActivityA)        14        12
(100001799, ActivityB)         7         3

How can I convert the tuples in the index to a MultiIndex? Ie, so that the end result looks like this instead:

                        bar__sum  foo__sum
id        act_type
100000550 ActivityA        14        12
100001799 ActivityB         7         3

What's the best way to do this? Is there some option on the DataFrame creation that I'm missing? Or should it happen via a list comprehension, which feels inefficient to me.

Roman Pekar
  • 107,110
  • 28
  • 195
  • 197
Twain
  • 185
  • 1
  • 1
  • 5

1 Answers1

26

If you want to convert index of your dataframe:

>>> df.index = pd.MultiIndex.from_tuples(df.index)
>>> df
                     bar__sum  foo__sum
100000550 ActivityA        14        12
100001799 ActivityB         7         3

>>> df.index.names = ['id', 'act_type']
>>> df
                     bar__sum  foo__sum
id        act_type                     
100000550 ActivityA        14        12
100001799 ActivityB         7         3

You can also create DataFrame directly from dictionary (d is your dict, don't call your variable dict since it'll shadow standard python dictionary):

>>> pd.DataFrame(d.values(), index=pd.MultiIndex.from_tuples(d.keys(), names=['id', 'act_type']))
                     bar__sum  foo__sum
id        act_type                     
100001799 ActivityB         7         3
100000550 ActivityA        14        12

Note that values() and keys() are always in the same order, so no worries about that.

Community
  • 1
  • 1
Roman Pekar
  • 107,110
  • 28
  • 195
  • 197
  • 1
    Nice trick passing only `d.values()` as the argument! I was trying to figure out something to get access to the post-sorted index after passing `d`, but this way you don't need it at all. – DSM Nov 22 '13 at 19:57
  • using Python 3.6 and pandas 0.23.1 `d.values()` isn't an acceptable data type to create the dataframe. If you cast `d.values` to a list it fixes the issue. `pd.DataFrame(list(d.values()), index=pd.MultiIndex.from_tuples(d.keys(), names=['id', 'act_type']))` should do the trick – kindjacket Jul 10 '18 at 15:45