How to convert a pandas dataframe into a format (similar to one hot encoding) taking an amount-column into account

Question

What is the most convenient way to convert a pandas dataframe (entailing date, amount, category) into a one hot endocing format which takes the amount-column into account. Please see the example below.

You can read the [official docs](https://pandas.pydata.org/docs/user_guide/reshaping.html#reshaping-by-pivoting-dataframe-objects) on reshaping. — Henry Yik, Oct 15 '21 at 14:14
why not use pivot https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.pivot.html — nbk, Oct 15 '21 at 14:28
Thx, I have tried pivot. However, it failed as my index column (aka Date) contains duplicate entries. Maybe I am missing something. — Raphael, Oct 15 '21 at 14:32
Ok, I have solved the problem. I had the problem that in my category column there were also some duplicates for a certain index (aka date). This raised the 'ValueError: Index contains duplicate entries, cannot reshape'. Solution: I have aggregated the amount column by category first. Afterwards, the df.pivot(index='Date', columns='Category', values='Amount') conversion worked. — Raphael, Oct 15 '21 at 15:00
You can also use [`pivot_table`](https://pandas.pydata.org/docs/reference/api/pandas.pivot_table.html) and specify the aggregation to be performed in the case of duplicates. For example if you wanted the total amount on Date you could use `df.pivot_table(index='Date', columns='Category', values='Amount', aggfunc='sum', fill_value=0)` — Henry Ecker, Oct 15 '21 at 18:29

score 0 · Answer 1 · answered Oct 15 '21 at 14:17

You can just loop over the database and use the values for each entry for indexing the new database. See example below:

import pandas as pd

# create the example databases
d1 = {'value' : [100,200,300], 'char': ['a' , 'b', 'c']}
d2 = {'a' : [None, None, None], 'b':  [None, None, None], 'c': [None, None, None] }
d1 = pd.DataFrame(data=d1)
d2 = pd.DataFrame(data=d2)

# loop over the entries and use their values for indexing in the new database
for i in d1.index:
    d2[d1['char'][i]][i] = d1['value'][i]
    
print(d2)

How to convert a pandas dataframe into a format (similar to one hot encoding) taking an amount-column into account

1 Answers1