I have following dataframe:
data = {
'person1_name': ['John_Ethan_Wayne', 'John_Ethan_Wayne', 'Michael_Wayne', 'Michael_Wayne', 'Patrick_Wayne', 'Patrick_Wayne'],
'family1_name': ['Wayne', 'Wayne', 'Wayne', 'Wayne', 'Wayne', 'Wayne'],
'person2_name': ['Michael_Wayne', 'Patrick_Wayne', 'Patrick_Wayne', 'John_Ethan_Wayne', 'John_Ethan_Wayne', 'Michael_Wayne'],
'family2_name': ['Wayne', 'Wayne', 'Wayne', 'Wayne', 'Wayne', 'Wayne']
}
df = pd.DataFrame(data)
person1_name family1_name person2_name family2_name
John_Ethan_Wayne Wayne Michael_Wayne Wayne
John_Ethan_Wayne Wayne Patrick_Wayne Wayne
Michael_Wayne Wayne Patrick_Wayne Wayne
Michael_Wayne Wayne John_Ethan_Wayne Wayne
Patrick_Wayne Wayne John_Ethan_Wayne Wayne
Patrick_Wayne Wayne Michael_Wayne Wayne
I want to drop duplicates of (person1_name, family1_name)
and (person2_name, family2_name)
ignoring the direction of relation.
The final result should be:
person1_name family1_name person2_name family2_name
John_Ethan_Wayne Wayne Michael_Wayne Wayne
Michael_Wayne Wayne Patrick_Wayne Wayne
Patrick_Wayne Wayne John_Ethan_Wayne Wayne