I have a data frames where I am trying to find all possible combinations of itself and a fraction of itself. The following data frames is a much scaled down version of the one I am running. The first data frame (fruit1) is a fraction of the second data frame (fruit2).
FruitSubDF FruitFullDF
apple apple
cherry cherry
banana banana
peach
plum
By running the following code
df1 = pd.DataFrame(list(product(fruitDF.iloc[0:3,0], fruitDF.iloc[0:5,0])), columns=['fruit1', 'fruit2'])
the output is
Fruit1 Fruit2
0 apple banana
1 apple apple
2 apple cherry
3 apple peach
4 apple plum
5 cherry banana
6 cherry apple
7 cherry cherry
.
.
18 banana banana
19 banana peach
20 banana plum
My problem is I want to remove elements with the same two fruits regardless of which fruit is in which column as below. So I am considering (apple,cherry) and (cherry,apple) as the same but I am unsure of an efficient way instead of iterRows to weed out the unwanted data as most pandas functions I find will remove based on the order.
Fruit1 Fruit2
0 apple banana
1 apple cherry
2 apple apple
3 apple peach
4 apple plum
5 cherry banana
6 cherry cherry
.
.
15 banana plum