pandas group by: usage in this instance

Question

despite the extensive help provided here and here I was unable to figure out how to do the following:

given this dataset (df):

import pandas as pd
improt numpy as np

df = pd.DataFrame([['CORE1', 'CORE2', 'CORE3', 'CORE1', 'CORE2', 'CORE3', 'CORE1', 'CORE2', 'CORE3', ],
                   ['alfa', 'beta', 'gamma', 'alfa', 'beta', 'gamma', 'alfa', 'beta', 'gamma', ],
                   np.random.rand(9).tolist()],
                  index=['ptf', 'name', 'value']).transpose()

name value ptf
alfa  0.1  CORE1
beta  0.7  CORE1
gamma 0.2  CORE1
alfa  0.3  CORE2
beta  0.4  CORE2
gamma 0.3  CORE2
alfa  0.9  CORE3
beta  0.05 CORE3
gamma 0.05 CORE3

turn into

      CORE1 CORE2 CORE3
alfa  0.1   0.3   0.9
beta  0.7   0.4   0.05
gamma 0.2   0.3   0.05

I was guessing somewhere in the lines of df.groupby(by='ptf') and something after. what exactly reamins to be understood.

Edit:

print(df.dtypes)

# 1st - works but takes numerate index - not what I want
print(df.pivot(columns='ptf', values='value'))
# 2nd - textbook made - does not work
print(df.pivot(index='name', columns='ptf', values='value'))
# 3rd - same as 2nd but with different constructor
print(pd.pivot_table(df, index='name', values='value', columns='ptf'))

Any help in the matter?

I always like to point to [this page](http://www.nikgrozev.org/2015/07/01/reshaping-in-pandas-pivot-pivot-table-stack-and-unstack-explained-with-pictures/) for a good introduction to pivoting and stacking. — IanS, Jul 01 '16 at 11:25

score 3 · Accepted Answer · answered Jul 01 '16 at 09:33

3

Use pivot:

print (df.pivot(index='name', columns='ptf', values='value'))
ptf    CORE1  CORE2  CORE3
name                      
alfa     0.1    0.3   0.90
beta     0.7    0.4   0.05
gamma    0.2    0.3   0.05

answered Jul 01 '16 at 09:33

jezrael

822,522
95
1,334
1,252

it's the right approach but I get the same error some users ecnounter here (http://stackoverflow.com/questions/11232275/pandas-pivot-warning-about-repeated-entries-on-index) ValueError: Index contains duplicate entries, cannot reshape – Asher11 Jul 01 '16 at 10:00
You need `(df.pivot_table(index='name', columns='ptf', values='value'))` what uses `aggfunc`, default is `aggfunc=np.mean` if duplicates. Better explanation with sample is [here](http://stackoverflow.com/a/37436813/2901002) and in [docs](http://pandas.pydata.org/pandas-docs/stable/reshaping.html#pivot-tables). – jezrael Jul 01 '16 at 10:07
tried that as well. `Error in this case: pandas.core.base.DataError: No numeric types to aggregate` – Asher11 Jul 01 '16 at 10:13
What is `df.dtypes`? Column `values` contains numbers, but `dtype` is not `float`? Or column `values` contains some texts? – jezrael Jul 01 '16 at 10:16
`print(df.dtypes)` `ptf object name object value object dtype: object` – Asher11 Jul 01 '16 at 10:20
First try convert column to float `df['value'] = df['value'].astype(float)`. – jezrael Jul 01 '16 at 10:22
I edited the original psot with more data. `df['value'] = df['value'].astype(float)` proved unsuccessful – Asher11 Jul 01 '16 at 10:22
Ok, try convert column to float, because I think it only seems there are numbers, but there are in real strings (object dtypes). – jezrael Jul 01 '16 at 10:23
you need use `df['value'] = pd.to_numeric(df['value'], errors='coerce')` and you get `NaN` values, where are problematic values which cannot be converted to numbers. You can also filter this values by: `df[pd.to_numeric(df['value'], errors='coerce').isnull()]`. – jezrael Jul 01 '16 at 11:22
I think `isnull()`, try `df = pd.DataFrame({'value':['8','a','1'], 'B':[4,5,6]})`. – jezrael Jul 01 '16 at 11:31
solved. my initla table was flawed and in the way I gave it indeed had duplicates. sorry everyone for the waste of time. – Asher11 Jul 01 '16 at 12:44

pandas group by: usage in this instance

1 Answers1