1

I have a df which I am trying to denormalize. Basically I want to change parameter values such as, 'inst-cap-c', 'cap-lo-c', etc... into columns.

In order to do that there were so far 2 methods within the pandas lib. With both of them I encountered some problems and were unable to denormalize this df...

df looks like following (for simplicity used ...):

data       
   Site  Storage Commodity     parameter    value
0  Mid   Pump    Elec         inst-cap-c        0
1  Mid   Pump    Elec           cap-lo-c        0
2  Mid   Pump    Elec           cap-up-c  1.5e+15
3  Mid   Pump    Elec         inst-cap-p        0
4  Mid   Pump    Elec           cap-lo-p        0
...
52 South Pump    Elec               wacc     0.07
53 South Pump    Elec       depreciation       50
54 South Pump    Elec               init        1
55 South Pump    Elec          discharge  3.5e-06
56 South Pump    Elec           ep-ratio     None

When I try to create columns with parameter values via:

data.pivot_table(values='value',
                 index=['Site', 'Storage', 'Commodity'],
                 columns='parameter')

it simply says: *** pandas.core.base.DataError: No numeric types to aggregate

I am guessing it is because None value of ep-ratio, I can't use NaN over None, because it created other problems.

So how can I denormalize this dataframe?

Expected outcome:

data       
   Site    Storage  Commodity  inst-cap-c  cap-lo-c cap-up-c ... ep-ratio
0  Mid     Pump     Elec                0         0  1.5e+15 ...     None
1  North   Pump     Elec                0         0  1.5e+15 ...     None
2  South   Pump     Elec                0         0  1.5e+15 ...     None

Extra:

data.set_index(['Site', 'Storage','Commodity'], append=True).unstack('parameter')
*** KeyError: 'Level parameter not found'

I checked this also: pivot_table No numeric types to aggregate it does not help

oakca
  • 1,408
  • 1
  • 18
  • 40
  • "I can't use NaN over None, because it created other problems." You should prefer to use NaN, you could perhaps replace the NaN with None after pivot BUT strongly you should prefer simply using NaN. – Andy Hayden Mar 01 '19 at 19:30
  • @AndyHayden with NaN there is a problem with pivot_table(): so it lets you choose if u want to drop values via dropna=True, it is set by default. The problem is with some data I have to dropna and with some of them like the one above I should not drop it... It is because of the structure of my data. So I cant end up with a consistent method which I could use every case of my model data. That is why I changed the NaN values to None before. Now I cannot be able to perform the denormalizing... Looking for better suggestions... – oakca Mar 02 '19 at 16:23

1 Answers1

1

You are close, need parameter column add to list, select column value before unstack and last use reset_index with rename_axis for data cleaning:

df = (data.set_index(['Site', 'Storage','Commodity','parameter'])['value']
          .unstack()
          .reset_index()
          .rename_axis(None, axis=1))
print (df)
    Site Storage Commodity cap-lo-c cap-lo-p cap-up-c depreciation discharge  \
0    Mid    Pump      Elec        0        0  1.5e+15          NaN       NaN   
1  South    Pump      Elec      NaN      NaN      NaN           50   3.5e-06   

  ep-ratio init inst-cap-c inst-cap-p  wacc  
0      NaN  NaN          0          0   NaN  
1     None    1        NaN        NaN  0.07  
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252