I have a question that concerns the use of the pivot function in pandas. I have a table (df_init) with a bunch of customer Ids (7000 different Ids) and the product codes they purchased
CST_ID | PROD_CODE |
---|---|
11111 | 1234 |
11111 | 2345 |
11111 | 5425 |
11111 | 9875 |
22222 | 2345 |
22222 | 9251 |
22222 | 1234 |
33333 | 6542 |
33333 | 7498 |
Each Id can be repeated at most 4 time in the table, but can appear less than 4 times (e,g, 22222 and 33333). I want to reorganize that table as follows (df_fin)
CST_ID | PROD_1 | PROD_2 | PROD_3 | PROD_4 |
---|---|---|---|---|
11111 | 1234 | 2345 | 5425 | 9875 |
22222 | 2345 | 9251 | 1234 | NaN |
33333 | 6542 | 7498 | NaN | NaN |
Good news is, I have found a way to do so. Bad news I am not satisfied as it loops over the Customer Ids nd takes a while. Namely I count the occurrences of a certain Id while looping over the column and add that to a list, then append this list as a new variable to df_init
to_append = []
for index in range(len(df_init)):
temp = df_init.iloc[:index+1]['CST_ID'] == df_init.iloc[index]['CST_ID'] # ['CST_ID']== df_init.iloc[index]['CST_ID']]
counter = sum(list(temp))
to_append.append(counter)
df_init['Product_number'] = to_append
Afterwards I pivot and rename the columns
df_fin = df_init.pivot(index='CST_ID', columns='Product_number', values='PROD_CODE').rename_axis(None).reset_index()
df_fin.columns=['CST_ID', 'pdt1', 'pdt2', 'pdt3', 'pdt4']
Of course this solution works just fine, but looping in order to create the column which I use for the columns specification of the Pivot takes time. Hence I was wondering if there was a better solution (perhapes embedded already in Pandas or in the Pivot method) to do so.
Thanks to anyone who is willing to participate
Best