2

i have a dataframe that looks like this:

NAME    MONTH   HOURS
John   January    5
John   February   3
Sarah  August     2
Sarah  February   1

And the output i need:

NAME January February August TOTAL_HOURS
John   5        3       0        8
Sarah  0        1       2        3

I tryied using crosstab but cannot make it to maintaing the name Somebody can hlp me?

1 Answers1

4

For correct ordering is necessary first convert column MONTH to ordered categoricals, then use DataFrame.pivot_table with margins paramter and last remove last row by DataFrame.iloc:

cats = ['January','February','March','April','May','June',
        'July','August','September','October','November','December']
df['MONTH'] = pd.Categorical(df['MONTH'], ordered=True, categories=cats)

df = df.pivot_table(index='NAME', 
                    columns='MONTH', 
                    values='HOURS',
                    margins=True, 
                    margins_name='TOTAL_HOURS',
                    fill_value=0,
                    aggfunc='sum').iloc[:-1]
print (df)
MONTH  January  February  August  TOTAL_HOURS
NAME                                         
John         5         3       0            8
Sarah        0         1       2            3

Last convert index to column:

df = df.reset_index().rename_axis(None, axis=1)
print (df)
    NAME  January  February  August  TOTAL_HOURS
0   John        5         3       0            8
1  Sarah        0         1       2            3
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252