I have a dataframe such as
COL1 COL2
Homo_sapiens Mus_musculus
Mus_musculus Homo_sapiens
Droso_A Droso_b
Droso_A Droso_b
Betta_spe Rattus_rattus
Betta_spe Rattus_norvegirus
How can I remove duplicated values within COL1
and COL2
not matter where the values are, wich mean that I want to remove duplicate couple values. Here is an example:
For instance, Homo_sapiens
is present in COL1
AND Mus_musculus
is in COL2
But since Homo_sapien
s is present in COL2
AND Mus_musculus
is in COL1
as well,
I only keep the first one :
COL1 COL2
Homo_sapiens Mus_musculus
Droso_A Droso_b
Droso_A Droso_b
Betta_spe Rattus_rattus
Betta_spe Rattus_norvegirus
Then for Droso_A and Droso_b it is a classic duplicate that can be achieved using :
df = df.drop_duplicates(subset = ["COL1","COL2"])
COL1 COL2
Homo_sapiens Mus_musculus
Droso_A Droso_b
Betta_spe Rattus_rattus
Betta_spe Rattus_norvegirus
Then Betta_spe and Rattus_rattus and Rattus_norvegicus does not have any duplicate :
COL1 COL2
Homo_sapiens Mus_musculus
Droso_A Droso_b
Droso_A Droso_b
Betta_spe Rattus_rattus
Betta_spe Rattus_norvegirus