0

There is data frame like this:

   id   code    count
0   1   101     777
1   1   102     170
2   1   103     65
3   2   101     43
4   2   102     40

'code' columns is a categorical variable and needs to be dummied based on value of column 'count' and aggregated for each 'id' and here is the table I would like to have:

   id   code_101    code_102    code_103
0   1   777         170         65
1   2   43          40          0 

Any idea?

MTT
  • 5,113
  • 7
  • 35
  • 61
  • 1
    Use `df.pivot('id','code','count').add_prefix('code_').fillna(0)` – jezrael Sep 20 '19 at 05:29
  • 1
    Or `df.set_index(['id', 'code'])['count'].unstack(fill_value=0).add_prefix('code_')` – jezrael Sep 20 '19 at 05:31
  • @jezrael: Thanks for suggestion. However when I tried on real data (not this example), I got this error: `ValueError: Index contains duplicate entries, cannot reshape` – MTT Sep 20 '19 at 05:45
  • 1
    Sure, it means is necesary `pivot_table`, in dupe it is explain more. – jezrael Sep 20 '19 at 05:45
  • 1
    So both suggestion are changed to `df.pivot_table(index='id',columns='code',values='count', aggfunc='sum', fill_values=0).add_prefix('code_')` or `df.groupby(['id', 'code'])['count'].sum().unstack(fill_value=0).add_prefix('code_')` – jezrael Sep 20 '19 at 05:47

1 Answers1

1
new_ix = pd.MultiIndex.from_product([range(1,3), [101, 102,103]], names=['id', 'code'])
df_new = df.set_index(['id', 'code'])
df_new.reindex(new_ix, fill_value=0).reset_index()
df_new=df_new.unstack().fillna(0)
df_new.columns=['_'.join(map(str,i)) for i in df_new.columns]
df_new.reset_index()

output

id  count_101   count_102   count_103
0   1   777.0   170.0   65.0
1   2   43.0    40.0    0.0
moys
  • 7,747
  • 2
  • 11
  • 42