34

If I have two lists

l1 = ['A', 'B']

l2 = [1, 2]

what is the most elegant way to get a pandas data frame which looks like:

+-----+-----+-----+
|     | l1  | l2  |
+-----+-----+-----+
|  0  | A   | 1   |
+-----+-----+-----+
|  1  | A   | 2   |
+-----+-----+-----+
|  2  | B   | 1   |
+-----+-----+-----+
|  3  | B   | 2   |
+-----+-----+-----+

Note, the first column is the index.

Mykola Zotko
  • 15,583
  • 3
  • 71
  • 73
K.Chen
  • 1,166
  • 1
  • 11
  • 18

4 Answers4

51

use product from itertools:

>>> from itertools import product
>>> pd.DataFrame(list(product(l1, l2)), columns=['l1', 'l2'])
  l1  l2
0  A   1
1  A   2
2  B   1
3  B   2
behzad.nouri
  • 74,723
  • 18
  • 126
  • 124
  • thank you it works, i also modified something better ``from itertools import product filter_df = pd.DataFrame(list(product((df['l1'].unique()).tolist(), (df['l2'].unique()).tolist())), columns=['l1', 'l2'])`` – Amir Jan 18 '23 at 09:21
21

As an alternative you can use pandas' cartesian_product (may be more useful with large numpy arrays):

In [11]: lp1, lp2 = pd.core.reshape.util.cartesian_product([l1, l2])

In [12]: pd.DataFrame(dict(l1=lp1, l2=lp2))
Out[12]:
  l1  l2
0  A   1
1  A   2
2  B   1
3  B   2

This seems a little messy to read in to a DataFrame with the correct orient...

Note: previously cartesian_product was located at pd.core.reshape.util.cartesian_product.

adir abargil
  • 5,495
  • 3
  • 19
  • 29
Andy Hayden
  • 359,921
  • 101
  • 625
  • 535
  • *atm there is a pd.MultiIndex.from_product, not sure how useful DataFrame constructor would be...* – Andy Hayden Sep 03 '14 at 04:45
  • 5
    As of pandas 0.20.2, `cartesian_product()` is in `pd.core.reshape.util`. This solution is faster than using `itertools.product`, and can be made even faster by initializing the dataframe with `np.array().T` of the non-unpacked data instead. – Ken Wei Jul 05 '17 at 09:18
  • This is an elegant solution and works just as easily for 3+ lists. I just used it quickly to find all combinations of 5 lists. Very nice! – Lenwood Oct 30 '18 at 17:03
5

You can also use the sklearn library, which uses a NumPy-based approach:

from sklearn.utils.extmath import cartesian

df = pd.DataFrame(cartesian((L1, L2)))

For more verbose but possibly more efficient variants see Numpy: cartesian product of x and y array points into single array of 2D points.

jpp
  • 159,742
  • 34
  • 281
  • 339
1

You can use the function merge:

df1 = pd.DataFrame(l1, columns=['l1'])
df2 = pd.DataFrame(l2, columns=['l2'])

df1.merge(df2, how='cross')

Output:

  l1  l2
0  A   1
1  A   2
2  B   1
3  B   2
Mykola Zotko
  • 15,583
  • 3
  • 71
  • 73