Pivoting a One-Hot-Encode Dataframe

Question

I have a pandas dataframe that looks like this:

genres.head()

   Drama   Comedy  Action  Crime   Romance Thriller    Adventure   Horror  Mystery Fantasy ... History Music   War Documentary Sport   Musical Western Film-Noir   News    number_of_genres
tconst                                                                                  
tt0111161   1   0   0   0   0   0   0   0   0   0   ... 0   0   0   0   0   0   0   0   0   1
tt0468569   1   0   1   1   0   0   0   0   0   0   ... 0   0   0   0   0   0   0   0   0   3
tt1375666   0   0   1   0   0   0   1   0   0   0   ... 0   0   0   0   0   0   0   0   0   3
tt0137523   1   0   0   0   0   0   0   0   0   0   ... 0   0   0   0   0   0   0   0   0   1
tt0110912   1   0   0   1   0   0   0   0   0   0   ... 0   0   0   0   0   0   0   0   0   2

I want to be able to get a table where the rows are the genres, the columns are the number of labels for a given movie and the values are the counts. In other words, I want this:

number_of_genres    1   2   3   totals
Drama   451 1481    3574    5506
Comedy  333 1108    2248    3689
Action  9   230 1971    2210
Crime   1   284 1687    1972
Romance 1   646 1156    1803
Thriller    22  449 1153    1624
Adventure   1   98  1454    1553
Horror  137 324 765 1226
Mystery 0   108 792 900
Fantasy 1   74  642 717
Sci-Fi  0   129 551 680
Biography   0   95  532 627
Family  0   60  452 512
Animation   0   6   431 437
History 0   32  314 346
Music   1   87  223 311
War 0   90  162 252
Documentary 70  82  78  230
Sport   0   78  142 220
Musical 0   13  131 144
Western 19  44  57  120
Film-Noir   0   11  50  61
News    0   1   2   3
Total   1046    5530    18567   25143

What is the best way of getting that table pythonistically? I solved the problem through the following code but was wondering if there's a better way:

genres['number_of_genres'] = genres.sum(axis=1)
pivots = []
for column in genres.columns[0:-1]:
    column = pd.DataFrame(genres[column])
    columns = column.join(genres.number_of_genres)
    pivot = pd.pivot_table(columns, values=columns.columns[0], columns='number_of_genres', aggfunc=np.sum)
    pivots.append(pivot)

pivots_df = pd.concat(pivots)
pivots_df['totals'] = pivots_df.sum(axis=1)
pivots_df.loc['Total'] = pivots_df.sum()

[EDIT]: Added jupyter output that should be compatible with pd.read_clipboard(). If I can format the output better, please let me know how I can do so.

if you don't get any responses or start to recieve down votes you may want to look at this post about [how to make a good `pandas` example](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) — MattR, Mar 05 '19 at 21:37
@MattR, I see what you mean. How do I include the output of my Jupyter cell in my question though? It doesn't seem like it's a copy and paste. — Python Developer, Mar 05 '19 at 22:14
@GideonBlinick I think you made some good edits. Keep posting questions with a similar format to this and the ones in the link that I posted and you will be on your way to the happy place within the `pandas` happy place here on Stack Overflow :) — MattR, Mar 06 '19 at 13:03

JoergVanAken · Accepted Answer · 2019-03-06T05:13:25.140

1

Maybe I'm missing something but doesn't this work for you?

agg = df.groupby('number_of_genres').agg('sum').T
agg['totals'] = agg.sum(axis=1)

Edit: Solution via pivot_table

agg = df.pivot_table(columns='number_of_genres', aggfunc='sum')
agg['total'] = agg.sum(axis=1)

edited Mar 06 '19 at 05:13

answered Mar 05 '19 at 23:04

JoergVanAken

1,286
9
10

Thanks @JoergVanAken! You weren't missing anything. That's what I'm looking for. I was trying to do it with pivot_table. Do you know how you can do it that way? – Python Developer Mar 05 '19 at 23:54
wow, that's super simple. I guess I found the most convoluted way of answering it. thanks for the help! – Python Developer Mar 06 '19 at 14:33

Pivoting a One-Hot-Encode Dataframe

1 Answers1

Linked