How to create a python dataframe containing the mean of some rows of another dataframe

Question

I have a pandas DataFrame containing some values:

                        id  pair      value  subdir
taylor_1e3c_1s_56C  taylor  6_13  -0.398716    run1 
taylor_1e3c_1s_56C  taylor  6_13  -0.397820    run2 
taylor_1e3c_1s_56C  taylor  6_13  -0.397310    run3 
taylor_1e3c_1s_56C  taylor  6_13  -0.390520    run4 
taylor_1e3c_1s_56C  taylor  6_13  -0.377390    run5 
taylor_1e3c_1s_56C  taylor  8_11  -0.393604    run1
taylor_1e3c_1s_56C  taylor  8_11  -0.392899    run2
taylor_1e3c_1s_56C  taylor  8_11  -0.392473    run3
taylor_1e3c_1s_56C  taylor  8_11  -0.389959    run4
taylor_1e3c_1s_56C  taylor  8_11  -0.387946    run5

what I would like to do is to isolate the rows that have the same index, id, and pair, compute the mean over the value column, and put it all in a new dataframe. Because I have now effectively averaged over all the possible values of subdir, that column should also be removed. So the output should look something like this

                        id  pair      value
taylor_1e3c_1s_56C  taylor  6_13  -0.392351
taylor_1e3c_1s_56C  taylor  8_11  -0.391376

How should I do it in pandas?

jezrael · Accepted Answer · 2018-02-07T14:00:17.743

Use syntactic sugar - groupby by Series and indices and aggregate mean:

df = df['value'].groupby([df.index, df['id'], df['pair']]).mean().reset_index(level=[1,2])
print (df)
                        id  pair     value
taylor_1e3c_1s_56C  taylor  6_13 -0.392351
taylor_1e3c_1s_56C  taylor  8_11 -0.391376

Classic solution - first reset_index for column from indices and then groupby by columns names and aggregate mean:

df = df.reset_index().groupby(['index','id','pair'])['value'].mean().reset_index(level=[1,2])
print (df)
                        id  pair     value
index                                     
taylor_1e3c_1s_56C  taylor  6_13 -0.392351
taylor_1e3c_1s_56C  taylor  8_11 -0.391376

Detail:

print (df.reset_index())
                index      id  pair     value subdir
0  taylor_1e3c_1s_56C  taylor  6_13 -0.398716   run1
1  taylor_1e3c_1s_56C  taylor  6_13 -0.397820   run2
2  taylor_1e3c_1s_56C  taylor  6_13 -0.397310   run3
3  taylor_1e3c_1s_56C  taylor  6_13 -0.390520   run4
4  taylor_1e3c_1s_56C  taylor  6_13 -0.377390   run5
5  taylor_1e3c_1s_56C  taylor  8_11 -0.393604   run1
6  taylor_1e3c_1s_56C  taylor  8_11 -0.392899   run2
7  taylor_1e3c_1s_56C  taylor  8_11 -0.392473   run3
8  taylor_1e3c_1s_56C  taylor  8_11 -0.389959   run4
9  taylor_1e3c_1s_56C  taylor  8_11 -0.387946   run5

After aggregate mean get MultiIndex with 3 levels:

print (df.reset_index().groupby(['index','id','pair'])['value'].mean())
index               id      pair
taylor_1e3c_1s_56C  taylor  6_13   -0.392351
                            8_11   -0.391376
Name: value, dtype: float64

So is necessesary reset_index for convert second ant third level to columns:

print (df.reset_index()
        .groupby(['index','id','pair'])['value']
        .mean()
        .reset_index(level=[1,2]))
                        id  pair     value
index                                     
taylor_1e3c_1s_56C  taylor  6_13 -0.392351
taylor_1e3c_1s_56C  taylor  8_11 -0.391376

Thanks much! I figured out most of it by looking the different concepts up, but why do I need to reset the index? And what does it mean to resect it at level 1,2? — Ferdinando Randisi, Feb 07 '18 at 13:58
I edit answer, thank you for comment. If something unclear, give me know. — jezrael, Feb 07 '18 at 14:01

How to create a python dataframe containing the mean of some rows of another dataframe

1 Answers1

Linked