Pandas dataframe row reduction/compression (for rows having the same value)

Question

I have a df such as:

seed  value
 1      2 
 1      3
 1      4
 2      20
 2      60

would like to get a shorter dataframe (df_short) with the average value for same seeds, such as

seed  value   
  1     3        
  2     40

tried df.groupby("seed"), seems working, though I am getting a DataFrameGroupBy object, which I cannot use as a dataframe. Then from the question Convert DataFrameGroupBy object to DataFrame pandas, I tried the following:

from pandas import *

df = DataFrame({'seed' : [1 ,1, 1, 2, 2],
               'value' : [2, 3, 4, 20, 60]})           
df_short=df.groupby('seed').aggregate(np.mean)
print df_short        #works as expected
print df_short['seed'] #error

print list(df_short.columns.values) should give me ['seed', 'value']

Andrew · Accepted Answer · 2015-03-19T03:33:59.477

1

If you run the first three commands (through df = df.groupby('seed').mean()) you will notice that the values 1 and 2 are the index of df and cannot be accessed using df[seed]. We can use reset_index to convert seed to a column. Notice, the new index values are 0 and 1.

In [48]:

from pandas import *

df = DataFrame({'seed' : [1 ,1, 1, 2, 2],
               'value' : [2, 3, 4, 20, 60]})
df = df.groupby('seed').mean()
df.reset_index(inplace=True)

print df['seed']
print df['value']
print list(df.columns.values)

0    1
1    2
Name: seed, dtype: int64
0     3
1    40
Name: value, dtype: int64
['seed', 'value']

edited Mar 19 '15 at 03:33

answered Mar 19 '15 at 02:48

Andrew

3,711
2
20
17

well that is working with aggregate too, but I am trying to get y['seed'] – Emmet B Mar 19 '15 at 02:49
sorry I was confused. Do you want to return the values 1 and 2 in a dataframe. – Andrew Mar 19 '15 at 02:51
yes, I need 1,2 vs 3,40. My ultimate goal to plot seed vs value, with pylab.errorbar, but errorbar gets confused when there are multiple values having same seed. So I need to convert the dataframe. – Emmet B Mar 19 '15 at 02:53
I updated my answer to reflect your desired output. Returns a data frame with values 1, 2 in a column called seed. – Andrew Mar 19 '15 at 02:57
well I have to get both columns,not just one. `print list(y.columns.values)` should give me `['seed', 'value']` , not just `['seed']` or just `[ 'value']` – Emmet B Mar 19 '15 at 03:14

Pandas dataframe row reduction/compression (for rows having the same value)

1 Answers1