2

I'm looking for the simplest way to create a data frame from two others such that it contains all combinations of their elements. For instance we have these two dataframes:

list1 = ["A", "B", "C", "D", "E"]
list2 = ["x1", "x2", "x3", "x4", "x5", "x6", "x7", "x8"]

df1 = pd.DataFrame(list1)
df2 = pd.DataFrame(list2)

The result must be:

   0   1
0  A  x1
1  A  x2
2  A  x3
3  A  x4
4  A  x5
5  A  x6
6  A  x7
7  A  x8
8  B  x1
9  B  x2

I tried to combine from the lists and it works fine with small lists but not for the large ones. Thank you

Georgy
  • 12,464
  • 7
  • 65
  • 73
Mus
  • 183
  • 1
  • 1
  • 14
  • Can you elaborate a bit about the desired output? How did you end up with the desired output included in your question? – Grzegorz Skibinski May 25 '20 at 20:51
  • Does this answer your question? [Get all combinations of elements from two lists?](https://stackoverflow.com/questions/25634489/get-all-combinations-of-elements-from-two-lists) – Georgy May 29 '20 at 12:11

3 Answers3

6

You can use itertools.product:

import itertools
import pandas as pd

list1 = ["A", "B", "C", "D", "E"]
list2 = ["x1", "x2", "x3", "x4", "x5", "x6", "x7", "x8"]
result = pd.DataFrame(list(itertools.product(list1, list2)))
João Victor
  • 407
  • 2
  • 10
  • Thanks for answering. That's what I did. But it does not work for big dataframes – Mus May 25 '20 at 21:04
4
list1 = ["A", "B", "C", "D", "E"]
list2 = ["x1", "x2", "x3", "x4", "x5", "x6", "x7", "x8"]

df1 = pd.DataFrame(list1)
df2 = pd.DataFrame(list2)

df1['key'] = 0
df2['key'] = 0
print( df1.merge(df2, on='key', how='outer').drop(columns='key') )

Prints:

   0_x 0_y
0    A  x1
1    A  x2
2    A  x3
3    A  x4
4    A  x5
5    A  x6
6    A  x7
7    A  x8
8    B  x1
9    B  x2

...
Andrej Kesely
  • 168,389
  • 15
  • 48
  • 91
3

You want to join each element in df1 with all elements of df2.

You can do it using df.merge:

In [1820]: df1['tmp'] = 1   ## Create a dummy key in df1
In [1821]: df2['tmp'] = 1   ## Create a dummy key in df2

## Merge both frames on `tmp`
In [1824]: df1.merge(df2, on='tmp').drop('tmp', 1).rename(columns={'0_x': '0', '0_y':'1'}) 
Out[1824]: 
    0   1
0   A  x1
1   A  x2
2   A  x3
3   A  x4
4   A  x5
5   A  x6
6   A  x7
7   A  x8
8   B  x1
9   B  x2
10  B  x3
11  B  x4
12  B  x5
13  B  x6
14  B  x7
15  B  x8
16  C  x1
17  C  x2
18  C  x3
...
...
halfer
  • 19,824
  • 17
  • 99
  • 186
Mayank Porwal
  • 33,470
  • 8
  • 37
  • 58