1

I have a data frame which looks like this:

Salesman  Height   product      value
  Knut      6        voltage      5
  Knut      6        frequency    1
  Knut      6        speed        3
  Knut      3        voltage      3
  Knut      3        frequency    5
  Knut      3        speed        3
  Steve     6        frequency    2
  Steve     6        voltage      2

I want to reshape it so that it looks like this:

Salesman  Height    voltage  frequency  speed   
  Knut      6        5            1        3
  Knut      3        3            5        3
  Steve     6        2            2        NA    

For every "salesman" and "height" combination, each value of the column "product" should be changed as a column and would hold the value from the column value. If the value doesn't exist NA should be the placeholder.

This is my first time with panda and pretty much stuck with it. Looked into reshaping and pivot but couldn't get any result.

Scott Boston
  • 147,308
  • 15
  • 139
  • 187
ro ko
  • 2,906
  • 3
  • 37
  • 58

1 Answers1

2

IIUC,Using pivot_table

df.pivot_table(index=['Salesman','Height'],columns='product',values='value',aggfunc='sum').reset_index()
Out[873]: 
product Salesman  Height  frequency  speed  voltage
0           Knut       3        5.0    3.0      3.0
1           Knut       6        1.0    3.0      5.0
2          Steve       6        2.0    NaN      2.0

Without aggfunc and reset_index

df.pivot_table(index=['Salesman','Height'],columns='product',values='value')
Out[875]: 
product          frequency  speed  voltage
Salesman Height                           
Knut     3             5.0    3.0      3.0
         6             1.0    3.0      5.0
Steve    6             2.0    NaN      2.0
BENY
  • 317,841
  • 20
  • 164
  • 234
  • why the aggfunc ? and reset_index ? – ro ko Nov 29 '17 at 03:28
  • @roko that is the habit of my, in case I need to aggregate some value, If your are sure each product is unique for the index, you can remove it , reset_index is to make sure get the same output your showed :-), If we do not added, we will have mutiindex for the result – BENY Nov 29 '17 at 03:31
  • @roko feel free to upvote the answer as well if you found it useful. – piRSquared Nov 29 '17 at 04:19
  • @Wen If I am to use reset_index, how can I drop the "product" column/index later when I don't need it? – ro ko Nov 29 '17 at 04:29
  • 1
    @roko that is column names ,using `df.pivot_table(index=['Salesman','Height'],columns='product',values='value',aggfunc='sum').reset_index().rename_axis(None,1)` – BENY Nov 29 '17 at 04:32