I have a simple data frame with first and last names. I would like to get the equivalent of SQL SELF JOIN equivalent in pandas.
Here goes the full example:
import numpy as np
import pandas as pd
df = pd.DataFrame({'first_name': ['Rose','Summer','Jane','Kim','Jack'],
'last_name': ['Howard','Solstice','Kim','Cruz','Rose'],
'customer_id': [1,2,3,4,5]})
df
first_name last_name customer_id
0 Rose Howard 1
1 Summer Solstice 2
2 Jane Kim 3
3 Kim Cruz 4
4 Jack Rose 5
REQUIRED OUTPUT
customer_id first_name last_name customer_id_1 first_name_1 last_name_1
1 Rose Howard 5 Jack Rose
4 Kim Cruz 3 Jane Kim
Using SQL
select a.first_name, a.last_name, b.first_name, b.last_name
from df as a, df as b
where a.first_name = b.last_name
My attempt
(pd.concat( [ df[['first_name','last_name']],
df[['first_name','last_name']].add_suffix('_1')
], axis=1, ignore_index=False)
)
first_name last_name first_name_1 last_name_1
0 Rose Howard Rose Howard
1 Summer Solstice Summer Solstice
2 Jane Rose Jane Rose
But,
(pd.concat( [ df,df.add_suffix('_1')], axis=1)
.query(" first_name == last_name_1 ")
)
This gives empty output to my surprise!!
I want two rows and fours columns as given by SQL.