12

I have a utility function for creating a Pandas MultiIndex when I have two or more iterables and I want an index key for each unique pairing of the values in those iterables. It looks like this

import pandas as pd
import itertools

def product_index(values, names=None):
    """Make a MultiIndex from the combinatorial product of the values."""
    iterable = itertools.product(*values)
    idx = pd.MultiIndex.from_tuples(list(iterable), names=names)
    return idx

And could be used like:

a = range(3)
b = list("ab")
product_index([a, b])

To create

MultiIndex(levels=[[0, 1, 2], [u'a', u'b']],
           labels=[[0, 0, 1, 1, 2, 2], [0, 1, 0, 1, 0, 1]])

This works perfectly fine, but it seems like a common usecase and I am surprised I had to implement it myself. So, my question is, what have I missed/misunderstood in the Pandas library itself that offers this functionality?

Edit to add: This function has been added to Pandas as MultiIndex.from_product for the 0.13.1 release.

mwaskom
  • 46,693
  • 16
  • 125
  • 127

1 Answers1

13

This is a very similar construction (but using cartesian_product which for larger arrays is faster than itertools.product)

In [2]: from pandas.tools.util import cartesian_product

In [3]: MultiIndex.from_arrays(cartesian_product([range(3),list('ab')]))
Out[3]: 
MultiIndex(levels=[[0, 1, 2], [u'a', u'b']],
           labels=[[0, 0, 1, 1, 2, 2], [0, 1, 0, 1, 0, 1]])

could be added as a convience method, maybe MultiIndex.from_iterables(...)

pls open an issue (and PR if you'd like)

FYI I very rarely actually construct a multi-index 'manually', almost always easier to actually construct a frame and just set_index.

In [10]: df = DataFrame(dict(A = np.arange(6), 
                             B = ['foo'] * 3 + ['bar'] * 3, 
                             C = np.ones(6)+np.arange(6)%2)
                       ).set_index(['C','B']).sortlevel()

In [11]: df
Out[11]: 
       A
C B     
1 bar  4
  foo  0
  foo  2
2 bar  3
  bar  5
  foo  1

[6 rows x 1 columns]
Jeff
  • 125,376
  • 21
  • 220
  • 187
  • Can you show a small example of constructing the DataFrame and then setting the index? In other words, is there a better way than the responses to this question: http://stackoverflow.com/questions/12390336/how-to-fill-the-missing-record-of-pandas-dataframe-in-pythonic-way – Paul H Jan 23 '14 at 19:45
  • Thats a reasonable point, I mostly use it when I'm setting up an empty dataframe that will get filled in pieces as I iterate through the elements of the index. – mwaskom Jan 23 '14 at 19:46
  • @mwaskom I would say that filling up an empty DataFrame (and iterating through the index) isn't idiomatic pandas\*... there *may* be a cleaner way to do it. \*pandastic/pandorable – Andy Hayden Jan 24 '14 at 05:52
  • That may be true (btw I am +1 on pandorable) but a lot of data that ends up in my DataFrames is generated by functions that don't necessarily speak pandas. – mwaskom Jan 24 '14 at 06:32
  • And also often the *labels* themselves are the inputs to the functions that produce the data. – mwaskom Jan 24 '14 at 06:39
  • Note that in current builds of pandas, `cartesian_product` is in `pandas.core.reshape.util` (not `pandas.tools.util`). – BeingQuisitive Jun 02 '17 at 15:14
  • Could this answer be updated to use `MultiIndex.from_product`? – Sunny Patel Jul 15 '20 at 06:30