0

I have a df such as:

seed  value
 1      2 
 1      3
 1      4
 2      20
 2      60

would like to get a shorter dataframe (df_short) with the average value for same seeds, such as

seed  value   
  1     3        
  2     40

tried df.groupby("seed"), seems working, though I am getting a DataFrameGroupBy object, which I cannot use as a dataframe. Then from the question Convert DataFrameGroupBy object to DataFrame pandas, I tried the following:

from pandas import *

df = DataFrame({'seed' : [1 ,1, 1, 2, 2],
               'value' : [2, 3, 4, 20, 60]})           
df_short=df.groupby('seed').aggregate(np.mean)
print df_short        #works as expected
print df_short['seed'] #error  

print list(df_short.columns.values) should give me ['seed', 'value']

Community
  • 1
  • 1
Emmet B
  • 5,341
  • 6
  • 34
  • 47

1 Answers1

1

If you run the first three commands (through df = df.groupby('seed').mean()) you will notice that the values 1 and 2 are the index of df and cannot be accessed using df[seed]. We can use reset_index to convert seed to a column. Notice, the new index values are 0 and 1.

In [48]:

from pandas import *

df = DataFrame({'seed' : [1 ,1, 1, 2, 2],
               'value' : [2, 3, 4, 20, 60]})
df = df.groupby('seed').mean()
df.reset_index(inplace=True)

print df['seed']
print df['value']
print list(df.columns.values)

0    1
1    2
Name: seed, dtype: int64
0     3
1    40
Name: value, dtype: int64
['seed', 'value']
Andrew
  • 3,711
  • 2
  • 20
  • 17
  • well that is working with aggregate too, but I am trying to get y['seed'] – Emmet B Mar 19 '15 at 02:49
  • sorry I was confused. Do you want to return the values 1 and 2 in a dataframe. – Andrew Mar 19 '15 at 02:51
  • yes, I need 1,2 vs 3,40. My ultimate goal to plot seed vs value, with pylab.errorbar, but errorbar gets confused when there are multiple values having same seed. So I need to convert the dataframe. – Emmet B Mar 19 '15 at 02:53
  • I updated my answer to reflect your desired output. Returns a data frame with values 1, 2 in a column called seed. – Andrew Mar 19 '15 at 02:57
  • well I have to get both columns,not just one. `print list(y.columns.values)` should give me `['seed', 'value']` , not just `['seed']` or just `[ 'value']` – Emmet B Mar 19 '15 at 03:14