Pandas: Iterate group by object

Question

Given a data frame as following:

In [8]:
df
Out[8]:
Experiment  SampleVol   Mass
0   A   1   11
1   A   1   12
2   A   2   20
3   A   2   17
4   A   2   21
5   A   3   28
6   A   3   29
7   A   4   35
8   A   4   38
9   A   4   35
10  B   1   12
11  B   1   11
12  B   2   22
13  B   2   24
14  B   3   30
15  B   3   33
16  B   4   37
17  B   4   42
18  C   1   8
19  C   1   7
20  C   2   17
21  C   2   19
22  C   3   29
23  C   3   30
24  C   3   31
25  C   4   41
26  C   4   44
27  C   4   42

I would like to process some correlation study for the data frame of each Experiment. The study I want to conduct is to calculate the correlation of 'SampleVol' with its Mean('Mass').

The groupby function can help me to get the mean of masses. grp = df.groupby(['Experiment', 'SampleVol']) grp.mean()

Out[17]:
                       Mass
Experiment  SampleVol   
A            1         11.500000
             2         19.333333
             3         28.500000
             4         36.000000
B            1         11.500000
             2         23.000000
             3         31.500000
             4         39.500000
C            1          7.500000
             2         18.000000
             3         30.000000
             4         42.333333

I understand for each data frame I should use some numpy function to compute the correlation coefficient. But now, my question is how can I iterate the data frames for each Experiment.

Following is an example of the desired output.

Out[18]:

Experiment  Slope   Intercept
A            0.91   0.01
B            1.1    0.02
C            0.95   0.03

Thank you very much.

If iteration is your goal, you can simply do `for label, row in df.iterrows()`. — spicypumpkin, Mar 07 '17 at 00:17
@Posh_Pumpkin I would like to iterate the dataframes like {"SampleVol": [1, 2, 3, 4]; "Mass":[11.5, 19.33, 28.5, 36]} — ju., Mar 07 '17 at 15:19
Should post data as code object. People more likely to try to answer with runnable example. Note answer below, which I like, had to make up its own data, so does not directly answer your question. — pauljohn32, Apr 16 '19 at 03:58

score 2 · Accepted Answer · edited Jun 22 '22 at 16:17

You'll want to group on just the 'Experiment' column, rather than the two columns as you have above. You can iterate through the groups and perform a simple linear regression on the grouped values using the below code:

from scipy import stats
import pandas as pd 
import numpy as np

grp = df.groupby(['Experiment'])

output = pd.DataFrame(columns=['Slope', 'Intercept'])

for name, group in grp:
    slope, intercept, r_value, p_value, std_err = stats.linregress(group['SampleVol'], group['Mass'])
    output.loc[name] = [slope,intercept]
    
print(output)

For those curious, this is how I generated the dummy data and what it looks like:

df = pd.DataFrame()
df['Experiment'] = np.array(pd.date_range('2018-01-01', periods=12, freq='6h').strftime('%a'))
df['SampleVol'] = np.random.uniform(1,5,12)
df['Mass'] = np.random.uniform(10,42,12)

References:

Pandas: Iterate group by object

1 Answers1