I have a dataframe df
with column Items which contain item names in alphabetical order.
Items
-----
Apple
Ball
Bar
Cat
I want to join the data frame with itself to get two columns Item_x
and Item_y
such that in each row, the pairs of items (x,y) is unique in the sense that order in irrelevant i.e. the pair (Apple, Ball) will be considered a duplicate of the pair (Ball, Apple). So I only need to retain (Apple, Ball) because here the items are in alphabetical order and (Ball, Apple) is unwanted and must be deleted.
pd.merge(df,df, on='Items', how='outer')
does not work because it gives extra unwanted pairs such as (Apple, Apple)
and (Ball, Apple)
Question: How to join data frame with itself on a column and retain only the rows with unique values the two columns which are in the correct alphabetical order?