0

I have dataframe like this

   customer_id  product_id  sum
0      1            A        20
1      2            A        10
2      2            C        30
3      1            B        40

I want to transform it into a sparse dataframe where I will have customer data as rows and columns will be decoded products in order. Something like this:

   customer_id  Product_A_Sum  Product_B_Sum  Product_C_Sum
 0     1            20            40             0
 1     2            10             0             30

I have a brute force solution - something like this:

df_new= pd.DataFrame()
df_new['customer_id'] = df.customer_id.unique()
for product in range(len(list_products)):
    temp = df.groupby(['customer_id', product])['sum'].sum().reset_index().rename(columns = {'sum':('sum'+product)})
    df_new = df_new.merge(temp[['customer_id', 'sum'+product]], how='left', on = 'customer_id').fillna(0)

This code works but my list of products is large so it will not scale at all. Are there any pandas tricks that will allow me to do this easier?

Thank you in advance!

ov08
  • 125
  • 6

2 Answers2

2

try this,

df = df.pivot_table(index='customer_id', columns='product_id', values='sum', fill_value=0)
df.columns = [f"Product_sum_{x}" for x in df.columns]
df = df.reset_index()

O/P:

    customer_id  Product_sum_A  Product_sum_B  Product_sum_C
0            1             20             40              0
1            2             10              0             30
Mohamed Thasin ah
  • 10,754
  • 11
  • 52
  • 111
1

You should use pivot:

df.pivot(index='customer_id', columns='product_id', values='sum')
gtomer
  • 5,643
  • 1
  • 10
  • 21