30

I've spent hours browsing everywhere now to try to create a multiindex from dataframe in pandas. This is the dataframe I have (posting excel sheet mockup. I do have this in pandas dataframe):

have

And this is what I want:

want

I have tried

newmulti = currentDataFrame.set_index(['user_id','account_num'])

But it returns a dataframe, not a multiindex. Also, I could not figure out how to make 'user_id' level 0 and 'account_num' level 1. I think this must be trivial but I've read so many posts, tutorials, etc. and still could not figure it out. Partly because I'm a very visual person and most posts are not. Please help!

puifais
  • 738
  • 2
  • 9
  • 20
  • For processing purposes, both the tables are the same. But for display purposes, I suggest you to refer to: [stackoverflow.com/a/25127764/2306662](https://stackoverflow.com/a/25127764/2306662) – nikpod Jun 08 '17 at 18:30
  • But I thought I need multi-index if, say I want to plot total sales (of all account) vs. dates? – puifais Jun 08 '17 at 18:32
  • @puifais why can't you plot the second dataframe you've put together? – Andrew L Jun 08 '17 at 18:51

5 Answers5

35

You could simply use groupby in this case, which will create the multi-index automatically when it sums the sales along the requested columns.

df.groupby(['user_id', 'account_num', 'dates']).sales.sum().to_frame()

You should also be able to simply do this:

df.set_index(['user_id', 'account_num', 'dates'])

Although you probably want to avoid any duplicates (e.g. two or more rows with identical user_id, account_num and date values but different sales figures) by summing them, which is why I recommended using groupby.

If you need the multi-index, you can simply access viat new_df.index where new_df is the new dataframe created from either of the two operations above.

And user_id will be level 0 and account_num will be level 1.

Alexander
  • 105,104
  • 32
  • 201
  • 196
  • So this means, group by user_id, account_num, and dates and pull out sales data. if sales data so happen to have the same user_id, account_num, and dates, then sum them. is that right? – puifais Jun 09 '17 at 20:54
  • 1
    Sort of... It means you are aggregating the sales data via `sum`. If the column was not numeric, you wouldn't be able to sum it. You would have to use something like `first`, `last` or `unique` with a lambda function. – Alexander Jun 09 '17 at 21:13
  • What if I have more columns, other than sales? – Shravya Boggarapu Mar 20 '20 at 03:15
7

For clarification of future users I would like to add the following:

As said by Alexander,

df.set_index(['user_id', 'account_num', 'dates'])

with a possible inplace=True does the job.

The type(df) gives

pandas.core.frame.DataFrame

whereas type(df.index) is indeed the expected

pandas.core.indexes.multi.MultiIndex
Eulenfuchswiesel
  • 879
  • 9
  • 20
3

Use pd.MultiIndex.from_arrays

lvl0 = currentDataFrame.user_id.values
lvl1 = currentDataFrame.account_num.values

midx = pd.MultiIndex.from_arrays([lvl0, lvl1], names=['level 0', 'level 1'])
piRSquared
  • 285,575
  • 57
  • 475
  • 624
2

There are two ways to do it, albeit not exactly like you have shown, but it works.
Say you have the following df:

      A   B    C      D
0   nil one    1    NaN
1   bar one    5    5.0
2   foo two    3    8.0
3   bar three  2    1.0
4   foo two    4    2.0
5   bar two    6    NaN

1. Workaround 1:

df.set_index('A', append = True, drop = False).reorder_levels(order = [1,0]).sort_index()

This will return:

enter image description here

2. Workaround 2:

df.set_index(['A', 'B']).sort_index()

This will return:
enter image description here

user41855
  • 917
  • 8
  • 15
1

The DataFrame returned by currentDataFrame.set_index(['user_id','account_num']) has it's index set to ['user_id','account_num']

newmulti.index will return the MultiIndex object.

  • 1
    Um...I don't understand. So do I do `newmulti = currentDataFrame.set_index(['user_id','account_num'])` and then `newmultiReal = newmulti.index`? Would you please clarify? I'm totally new at pandas. – puifais Jun 08 '17 at 18:33
  • What is your end goal? If you want the original dataframe with a multiindex, you've already got it. –  Jun 08 '17 at 18:46