Create all possible combinations of multiple columns in a Pandas DataFrame

Question

Given the below DataFrame,

df = pd.DataFrame({'Student':['Siri','Alexa'], 'Class':['6', '7'], 'Section':['A','B'], 'Teacher':['Ravi','Mark'], 'School':['C','D']})

I would like to return a table with all possible combinations of the values of all columns. I achieved this, using the below code

df['key'] = 1

df1 = pd.merge(df.loc[:, ['key','Student']], df.loc[:,['key','Class']], how='outer')

df2 = df1.merge(df.loc[:,['key','Section']], how='outer')

df3 = df2.merge(df.loc[:,['key','Teacher']], how='outer')

df4 = df3.merge(df.loc[:,['key','School']], how='outer')

df4.drop(columns='key', inplace=True)

What is the simplest way to get this done, as I have 15 columns and through the above method, it would result in 14 merges and inefficient code?

[This answer](https://stackoverflow.com/questions/23668427/pandas-joining-multiple-dataframes-on-columns/23671390#answers) may help show you how to set it up as an loop — lwileczek, Apr 23 '18 at 12:30

score 6 · Answer 1 · answered Nov 17 '19 at 17:36

You meen like:

from itertools import product

df = pd.DataFrame({'Student':['Siri','Alexa'], 'Class':['6', '7'], 'Section':['A','B'], 'Teacher':['Ravi','Mark'], 'School':['C','D']})

uniques = [df[i].unique().tolist() for i in df.columns ]
pd.DataFrame(product(*uniques), columns = df.columns)

that results in a Cartesian product of all unique entries in each columns.

   Student Class Section Teacher School
0     Siri     6       A    Ravi      C
1     Siri     6       A    Ravi      D
2     Siri     6       A    Mark      C
3     Siri     6       A    Mark      D
4     Siri     6       B    Ravi      C
5     Siri     6       B    Ravi      D
6     Siri     6       B    Mark      C
7     Siri     6       B    Mark      D
8     Siri     7       A    Ravi      C
9     Siri     7       A    Ravi      D
10    Siri     7       A    Mark      C
11    Siri     7       A    Mark      D
12    Siri     7       B    Ravi      C
13    Siri     7       B    Ravi      D
14    Siri     7       B    Mark      C
15    Siri     7       B    Mark      D
16   Alexa     6       A    Ravi      C
17   Alexa     6       A    Ravi      D
18   Alexa     6       A    Mark      C
19   Alexa     6       A    Mark      D
20   Alexa     6       B    Ravi      C
21   Alexa     6       B    Ravi      D
22   Alexa     6       B    Mark      C
23   Alexa     6       B    Mark      D
24   Alexa     7       A    Ravi      C
25   Alexa     7       A    Ravi      D
26   Alexa     7       A    Mark      C
27   Alexa     7       A    Mark      D
28   Alexa     7       B    Ravi      C
29   Alexa     7       B    Ravi      D
30   Alexa     7       B    Mark      C
31   Alexa     7       B    Mark      D

Nice solution, but what if one of the columns in `df` has more unique values than the other columns? — Michael, Jun 19 '20 at 11:00
It will still work, he's building a list of lists, not a matrix, so they can have different lengths. — cabo, Jul 23 '21 at 14:22
I got this error - raise TypeError("data argument can't be an iterator") TypeError: data argument can't be an iterator — omer, Jan 11 '22 at 20:21

jpp · Answer 2 · 2018-04-23T17:05:41.257

4

You are looking for a Cartesian product. This is possible via itertools.product:

from itertools import product

prod = product(df['Class'].unique(), df['Section'].unique())
student_cols = [x for x in df.columns if x not in ('Class', 'Section')]
students = df[student_cols].drop_duplicates().values.tolist()

res = pd.DataFrame([s + list(p) for p in prod for s in students],
                   columns=list(student_cols+['Class', 'Section']))\
        .sort_values(list(student_cols+['Class', 'Section']))

print(res)

#   School Student Teacher Class Section
# 0      C    Siri    Ravi     6       A
# 2      C    Siri    Ravi     6       B
# 4      C    Siri    Ravi     7       A
# 6      C    Siri    Ravi     7       B
# 1      D   Alexa    Mark     6       A
# 3      D   Alexa    Mark     6       B
# 5      D   Alexa    Mark     7       A
# 7      D   Alexa    Mark     7       B

edited Apr 23 '18 at 17:05

answered Apr 23 '18 at 12:22

jpp

159,742
34
281
339

Thanks, this will work, but I have 15 columns in my DataFrame and hence, it would be a lengthy process even through this method, if at all I didn't get it wrong. – Narendra Gadidasu Apr 23 '18 at 12:33
@NarendraGadidasu, In that case, you should [edit](https://stackoverflow.com/posts/49980763/edit) your question to include, say, 5 columns - and mention you have 15 columns. Then people will be aware of this fact and we can account for this. – jpp Apr 23 '18 at 12:38
Edited the question. Thanks! – Narendra Gadidasu Apr 23 '18 at 13:05
@NarendraGadidasu, So can you update your dataframe with 5 columns and also add desired output? – jpp Apr 23 '18 at 13:06
Added an example with 5 columns – Narendra Gadidasu Apr 23 '18 at 14:05
@NarendraGadidasu, See update. This will work for any number of columns. – jpp Apr 23 '18 at 17:05

Create all possible combinations of multiple columns in a Pandas DataFrame

2 Answers2

Linked