-1

What is the most convenient way to convert a pandas dataframe (entailing date, amount, category) into a one hot endocing format which takes the amount-column into account. Please see the example below.

enter image description here

Raphael
  • 673
  • 1
  • 9
  • 27
  • 1
    You can read the [official docs](https://pandas.pydata.org/docs/user_guide/reshaping.html#reshaping-by-pivoting-dataframe-objects) on reshaping. – Henry Yik Oct 15 '21 at 14:14
  • 1
    why not use pivot https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.pivot.html – nbk Oct 15 '21 at 14:28
  • Thx, I have tried pivot. However, it failed as my index column (aka Date) contains duplicate entries. Maybe I am missing something. – Raphael Oct 15 '21 at 14:32
  • Ok, I have solved the problem. I had the problem that in my category column there were also some duplicates for a certain index (aka date). This raised the 'ValueError: Index contains duplicate entries, cannot reshape'. Solution: I have aggregated the amount column by category first. Afterwards, the df.pivot(index='Date', columns='Category', values='Amount') conversion worked. – Raphael Oct 15 '21 at 15:00
  • 1
    You can also use [`pivot_table`](https://pandas.pydata.org/docs/reference/api/pandas.pivot_table.html) and specify the aggregation to be performed in the case of duplicates. For example if you wanted the total amount on Date you could use `df.pivot_table(index='Date', columns='Category', values='Amount', aggfunc='sum', fill_value=0)` – Henry Ecker Oct 15 '21 at 18:29

1 Answers1

0

You can just loop over the database and use the values for each entry for indexing the new database. See example below:

import pandas as pd

# create the example databases
d1 = {'value' : [100,200,300], 'char': ['a' , 'b', 'c']}
d2 = {'a' : [None, None, None], 'b':  [None, None, None], 'c': [None, None, None] }
d1 = pd.DataFrame(data=d1)
d2 = pd.DataFrame(data=d2)

# loop over the entries and use their values for indexing in the new database
for i in d1.index:
    d2[d1['char'][i]][i] = d1['value'][i]
    
print(d2)
Thijs Ruigrok
  • 547
  • 2
  • 12