Any Pandas's function to aggregate and dummy a table?

Question

There is data frame like this:

   id   code    count
0   1   101     777
1   1   102     170
2   1   103     65
3   2   101     43
4   2   102     40

'code' columns is a categorical variable and needs to be dummied based on value of column 'count' and aggregated for each 'id' and here is the table I would like to have:

   id   code_101    code_102    code_103
0   1   777         170         65
1   2   43          40          0

Any idea?

Use `df.pivot('id','code','count').add_prefix('code_').fillna(0)` — jezrael, Sep 20 '19 at 05:29
Or `df.set_index(['id', 'code'])['count'].unstack(fill_value=0).add_prefix('code_')` — jezrael, Sep 20 '19 at 05:31
@jezrael: Thanks for suggestion. However when I tried on real data (not this example), I got this error: `ValueError: Index contains duplicate entries, cannot reshape` — MTT, Sep 20 '19 at 05:45
Sure, it means is necesary `pivot_table`, in dupe it is explain more. — jezrael, Sep 20 '19 at 05:45
So both suggestion are changed to `df.pivot_table(index='id',columns='code',values='count', aggfunc='sum', fill_values=0).add_prefix('code_')` or `df.groupby(['id', 'code'])['count'].sum().unstack(fill_value=0).add_prefix('code_')` — jezrael, Sep 20 '19 at 05:47

score 1 · Answer 1 · answered Sep 20 '19 at 05:26

1

new_ix = pd.MultiIndex.from_product([range(1,3), [101, 102,103]], names=['id', 'code'])
df_new = df.set_index(['id', 'code'])
df_new.reindex(new_ix, fill_value=0).reset_index()
df_new=df_new.unstack().fillna(0)
df_new.columns=['_'.join(map(str,i)) for i in df_new.columns]
df_new.reset_index()

output

id  count_101   count_102   count_103
0   1   777.0   170.0   65.0
1   2   43.0    40.0    0.0

answered Sep 20 '19 at 05:26

moys

7,747
2
11
42

1

@jezrael you are correct. it is much more simple than i made it to be. – moys Sep 20 '19 at 05:31

Any Pandas's function to aggregate and dummy a table?

1 Answers1