I have a pandas dataframe with texts each of which can belong to one genre and to several categories each. As each text can belong to several categories, the respective columns are one-hot encoded.
Here is an example (the actual dataframe has a lot more categories):
df = pd.DataFrame({'text':{0:'This is an example string', 1: 'this is another example', 2:'and another',3:'and yet another example'},'genre':{0: 'fiction', 1: 'fiction', 2: 'scientific', 3: 'news'},'category_nature':{0: 1, 1: 1, 2: 0, 3:1}, 'category_history':{0: 1, 1: 0, 2: 0, 3:1},'category_art':{0: 0, 1: 0, 2: 1, 3: 0}})
I'm looking for a way to get something like value_counts() on the categories but also on the genre, like this:
I first tried to change the format of the one-hot-encoded columns, but then I lose the "genre" column.
df_new = df.drop(columns=['text','genre']);
count = df_new.sum().sort_values(ascending=False)
I also checked the following post, but it wasn't exactly what I was looking for.
Python: get a frequency count based on two columns (variables) in pandas dataframe some row appers