5

Let's say I have the following series:

0    A
1    B
2    C
dtype: object

0    1
1    2
2    3
3    4
dtype: int64

How can I merge them to create an empty dataframe with every possible combination of values, like this:

   letter  number
0       A       1
1       A       2
2       A       3
3       A       4
4       B       1
5       B       2
6       B       3
7       B       4
8       C       1
9       C       2
10      C       3
11      C       4
bill
  • 111
  • 1
  • 2
  • 4
  • If you have a lot of data and need speed, check out this answer.... https://stackoverflow.com/questions/1208118/using-numpy-to-build-an-array-of-all-combinations-of-two-arrays – run-out Feb 18 '19 at 07:59

4 Answers4

9

Assuming the 2 series are s and s1, use itertools.product() which gives a cartesian product of input iterables :

import itertools
df = pd.DataFrame(list(itertools.product(s,s1)),columns=['letter','number'])
print(df)

    letter  number
0       A       1
1       A       2
2       A       3
3       A       4
4       B       1
5       B       2
6       B       3
7       B       4
8       C       1
9       C       2
10      C       3
11      C       4
anky
  • 74,114
  • 11
  • 41
  • 70
4

As of Pandas 1.2.0, there is a how='cross' option in pandas.merge() that produces the Cartesian product of the columns.

import pandas as pd

letters = pd.DataFrame({'letter': ['A','B','C']})
numbers = pd.DataFrame({'number': [1,2,3,4]})

together = pd.merge(letters, numbers, how = 'cross')
   letter  number
0       A       1
1       A       2
2       A       3
3       A       4
4       B       1
5       B       2
6       B       3
7       B       4
8       C       1
9       C       2
10      C       3
11      C       4

As an additional bonus, this function makes it easy to do so with more than one column.

letters = pd.DataFrame({'letterA': ['A','B','C'],
                        'letterB': ['D','D','E']})
numbers = pd.DataFrame({'number': [1,2,3,4]})

together = pd.merge(letters, numbers, how = 'cross')
   letterA letterB  number
0        A       D       1
1        A       D       2
2        A       D       3
3        A       D       4
4        B       D       1
5        B       D       2
6        B       D       3
7        B       D       4
8        C       E       1
9        C       E       2
10       C       E       3
11       C       E       4
NickCHK
  • 1,093
  • 7
  • 17
2

If you have 2 Series s1 and s2. you can do this:

pd.DataFrame(index=s1,columns=s2).unstack().reset_index()[["s1","s2"]]

It will give you the follow

   s1  s2
0   A   1
1   B   1
2   C   1
3   A   2
4   B   2
5   C   2
6   A   3
7   B   3
8   C   3
9   A   4
10  B   4
11  C   4
Dawei
  • 1,046
  • 12
  • 21
  • 2
    Just to add that this method requires that the series be named first, and the subsequent slicing ```[["s1", "s2"]]``` be done using the names of the series – kerwei Feb 18 '19 at 07:11
1

You can use pandas.MultiIndex.from_product():

import pandas as pd

pd.DataFrame(
  index = pd.MultiIndex
    .from_product(
      [
        ['A', 'B', 'C'],
        [1, 2, 3, 4]
      ],
      names = ['letters', 'numbers']
    )
)

which results in a hierarchical structure:

letters numbers
      A       1
              2
              3
              4
      B       1
              2
              3
              4
      C       1
              2
              3
              4

and you can further call .reset_index() to get ungrouped results:

    letters numbers
0         A       1
1         A       2
2         A       3
3         A       4
4         B       1
5         B       2
6         B       3
7         B       4
8         C       1
9         C       2
10        C       3
11        C       4

(However I find @NickCHK's answer to be the best)

Daniel
  • 8,655
  • 5
  • 60
  • 87