0

I have a dataframe such as

COL1         COL2
Homo_sapiens Mus_musculus
Mus_musculus Homo_sapiens 
Droso_A      Droso_b
Droso_A      Droso_b
Betta_spe    Rattus_rattus
Betta_spe    Rattus_norvegirus 

How can I remove duplicated values within COL1 and COL2 not matter where the values are, wich mean that I want to remove duplicate couple values. Here is an example:

For instance, Homo_sapiens is present in COL1 AND Mus_musculus is in COL2

But since Homo_sapiens is present in COL2 AND Mus_musculus is in COL1 as well, I only keep the first one :

COL1         COL2
Homo_sapiens Mus_musculus
Droso_A      Droso_b
Droso_A      Droso_b
Betta_spe    Rattus_rattus
Betta_spe    Rattus_norvegirus 

Then for Droso_A and Droso_b it is a classic duplicate that can be achieved using :

df = df.drop_duplicates(subset = ["COL1","COL2"])

COL1         COL2
Homo_sapiens Mus_musculus
Droso_A      Droso_b
Betta_spe    Rattus_rattus
Betta_spe    Rattus_norvegirus 

Then Betta_spe and Rattus_rattus and Rattus_norvegicus does not have any duplicate :

COL1         COL2
Homo_sapiens Mus_musculus
Droso_A      Droso_b
Droso_A      Droso_b
Betta_spe    Rattus_rattus
Betta_spe    Rattus_norvegirus 
chippycentra
  • 3,396
  • 1
  • 6
  • 24

0 Answers0