1

I have a PANDAS dataframe with three string columns that looks something like this:

Name     Surname    MiddleName
James    Bond       A
Maggie   Sweenie    B

I want to create a kind of outer join within the table so that every possible combination of Name, Surname and MiddleName is output. The output that I am looking for is:

Name     Surname    MiddleName
James    Bond       A
Maggie   Sweenie    B
James    Sweenie    A
James    Sweenie    B
Maggie   Bond       A
Maggie   Bond       B

Any ideas what's the most efficient way to do this ?

Number Logic
  • 852
  • 1
  • 9
  • 19

3 Answers3

4

IIUC using product

import itertools 
yourdf=pd.DataFrame(list(itertools.product(*df.values.T.tolist())),columns=df.columns)
yourdf
Out[937]: 
     Name  Surname MiddleName
0   James     Bond          A
1   James     Bond          B
2   James  Sweenie          A
3   James  Sweenie          B
4  Maggie     Bond          A
5  Maggie     Bond          B
6  Maggie  Sweenie          A
7  Maggie  Sweenie          B
BENY
  • 317,841
  • 20
  • 164
  • 234
1

You are looking for a kind of expand_grid functionality, which can be implemented with itertools.product(). From the pandas documentation , you can define expand_grid:

import itertools

def expand_grid(data_dict):
     rows = itertools.product(*data_dict.values())
     return pd.DataFrame.from_records(rows, columns=data_dict.keys())

expand_grid(df.to_dict('list'))  
Out[38]: 
     Name  Surname MidName
0   James     Bond       A
1   James     Bond       B
2   James  Sweenie       A
3   James  Sweenie       B
4  Maggie     Bond       A
5  Maggie     Bond       B
6  Maggie  Sweenie       A
7  Maggie  Sweenie       B 

Bin H.
  • 75
  • 1
  • 6
0

Without itertools:

pd.MultiIndex.from_product(df.T.values.tolist()).to_frame(index=False)
JoergVanAken
  • 1,286
  • 9
  • 10