So this is a common question but I cant find an answer that fits this particular scenario.
So I have a Dataframe
with columns for genres eg "Drama, Western" and one hot encoded versions of the genres so for the drama and western there is a 1 in both columns but where its just Western genre its 1 for that column 0 for drama.
I want a filtered dataframe containing rows with only Western and no other genre. Im trying to oversample for a model as it is a minor class but I don't want to increase other genre counts as a byproduct
There are multiple rows so I can't use the index and there are multiple genres so I can't use a condition like df[(df['Western']==1) & (df['Drama']==0)
without having to account for 24 genres.
Index | Genre | Drama | Western | Action | genre 4 |
0 Drama, Western 1 1 0 0
1 Western 0 1 0 0
3 Action, Western 0 1 1 0