2

Given an arbitrary number of lists, I'd like to produce a pandas DataFrame as the Cartesian product. For example, given:

a = [1, 2, 3]
b = ['val1', 'val2']
c = [100, 101]

I'd like to end up with a DataFrame with columns a, b, and c, and all 3x2x2=12 combinations.

Unlike cartesian product in pandas, I'm looking for the ability to provide more than two inputs, and I am not looking to pass DataFrames, which would involve keeping values within the same DataFrame together rather than taking combinations of it. Answers to this question will likely not overlap with answers to that one.

Unlike Cartesian product of x and y array points into single array of 2D points, I'm seeking a pandas DataFrame result, with named columns, rather than a two-dimensional numpy array.

Max Ghenis
  • 14,783
  • 16
  • 84
  • 132
  • Appreciate your attempt to expand on the other answer and create a more generalized way to do this ;} . However, it is still too similar to the original answer, and therefore I'm marking as a dupe ! – rafaelc Oct 04 '19 at 19:24
  • There's also an extensive discussion of how to make a fast cartesian product in the link I added. Granted that deals with arrays, but one could simply do `pd.DataFrame(cartesian_product(*[np.array(x) for x in [a,b,c]]))` – ALollz Oct 04 '19 at 20:13
  • @rafaelc This isn't just a more generalized version of the "cartesian product in pandas" question. That question asks for a Cartesian product between two DataFrames, keeping values within the same DataFrame together rather than taking combinations of it. The answer I provided will not address that question. The numpy question is more similar, but the time test doesn't include the `pd.MultiIndex.from_product` approach I provided. – Max Ghenis Oct 04 '19 at 20:39
  • Also the numpy question doesn't involve column names in the result, which would require a second additional step. – Max Ghenis Oct 04 '19 at 20:48

1 Answers1

8

Building on this answer to a related question (Cartesian product of two DataFrames), this function takes a dictionary of lists and returns the Cartesian product:

def cartesian_product(d):
    index = pd.MultiIndex.from_product(d.values(), names=d.keys())
    return pd.DataFrame(index=index).reset_index()

Example:

cartesian_product({'a': [1, 2, 3],
                   'b': ['val1', 'val2'],
                   'c': [100, 101]})
    a      b      c
0   1   val1    100
1   1   val1    101
2   1   val2    100
3   1   val2    101
4   2   val1    100
5   2   val1    101
6   2   val2    100
7   2   val2    101
8   3   val1    100
9   3   val1    101
10  3   val2    100
11  3   val2    101

I've added this to my microdf package.

Max Ghenis
  • 14,783
  • 16
  • 84
  • 132