What is the most convenient way to convert a pandas dataframe (entailing date, amount, category) into a one hot endocing format which takes the amount-column into account. Please see the example below.
Asked
Active
Viewed 38 times
-1
-
1You can read the [official docs](https://pandas.pydata.org/docs/user_guide/reshaping.html#reshaping-by-pivoting-dataframe-objects) on reshaping. – Henry Yik Oct 15 '21 at 14:14
-
1why not use pivot https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.pivot.html – nbk Oct 15 '21 at 14:28
-
Thx, I have tried pivot. However, it failed as my index column (aka Date) contains duplicate entries. Maybe I am missing something. – Raphael Oct 15 '21 at 14:32
-
Ok, I have solved the problem. I had the problem that in my category column there were also some duplicates for a certain index (aka date). This raised the 'ValueError: Index contains duplicate entries, cannot reshape'. Solution: I have aggregated the amount column by category first. Afterwards, the df.pivot(index='Date', columns='Category', values='Amount') conversion worked. – Raphael Oct 15 '21 at 15:00
-
1You can also use [`pivot_table`](https://pandas.pydata.org/docs/reference/api/pandas.pivot_table.html) and specify the aggregation to be performed in the case of duplicates. For example if you wanted the total amount on Date you could use `df.pivot_table(index='Date', columns='Category', values='Amount', aggfunc='sum', fill_value=0)` – Henry Ecker Oct 15 '21 at 18:29
1 Answers
0
You can just loop over the database and use the values for each entry for indexing the new database. See example below:
import pandas as pd
# create the example databases
d1 = {'value' : [100,200,300], 'char': ['a' , 'b', 'c']}
d2 = {'a' : [None, None, None], 'b': [None, None, None], 'c': [None, None, None] }
d1 = pd.DataFrame(data=d1)
d2 = pd.DataFrame(data=d2)
# loop over the entries and use their values for indexing in the new database
for i in d1.index:
d2[d1['char'][i]][i] = d1['value'][i]
print(d2)

Thijs Ruigrok
- 547
- 2
- 12