For two columns, I'd just use a combination of boolean indexes:
data = {'Thing1': ['Egg', 'Apple', 'Dog', 'Cat', 'Banana', 'Man', 'Elephant', 'Football', 'Egg', 'Elephant',
'Rome', 'Pakistan', 'Egg', 'Dog', 'Banana', 'Rome', 'Apple', 'Lime', 'Album'],
'Thing2': ['Lemon', 'Birmingham', 'Egg', 'Dog', 'Germany', 'Flower', 'Banana', 'Egg', 'Birmingham',
'School', 'Lake', 'Swimmer', 'Woman', 'Flower', 'Football', 'Egg', 'Waterfall', 'Egg', 'Bat'],
'PairScore': [3, 2, 5, 6, 4, 1, 8, 2, 6, 4, 3, 2, 4, 5, 6, 7, 3, 2, 1]}
df = pd.DataFrame(data)
>>> df.head()
Thing1 Thing2 PairScore
0 Egg Lemon 3
1 Apple Birmingham 2
2 Dog Egg 5
3 Cat Dog 6
4 Banana Germany 4
Selecting as per your criteria
m = df.Thing1.eq("Egg") | df.Thing2.eq("Egg")
# or even m = df.filter(like="Thing").eq("Egg").any(axis=1)
df2 = df[m & df.PairScore.lt(3)]
Result:
Thing1 Thing2 PairScore
7 Football Egg 2
17 Lime Egg 2
From there, one option to proceed is to .stack()
the Thing
columns into a single column, e.g., do df3[~df3.eq("Egg")]
to get rid of rows containing "Egg".
>>> df3 = df2.filter(like="Thing").stack()
>>> df3
7 Thing1 Football
Thing2 Egg
17 Thing1 Lime
Thing2 Egg
dtype: object
>>> df3[~df3.eq("Egg")]
7 Thing1 Football
17 Thing1 Lime
dtype: object
In order to do subsequent analyses on df
, excluding the rows from df2
above, you can proceed with
>>> df4 = pd.concat([df, df2]).drop_duplicates(keep=False)
df4
now contains everything from df
except of the previous matches:
>>> df4.tail()
Thing1 Thing2 PairScore
13 Dog Flower 5
14 Banana Football 6
15 Rome Egg 7
16 Apple Waterfall 3
18 Album Bat 1 # 17 is gone