Given a data frame as following:
In [8]:
df
Out[8]:
Experiment SampleVol Mass
0 A 1 11
1 A 1 12
2 A 2 20
3 A 2 17
4 A 2 21
5 A 3 28
6 A 3 29
7 A 4 35
8 A 4 38
9 A 4 35
10 B 1 12
11 B 1 11
12 B 2 22
13 B 2 24
14 B 3 30
15 B 3 33
16 B 4 37
17 B 4 42
18 C 1 8
19 C 1 7
20 C 2 17
21 C 2 19
22 C 3 29
23 C 3 30
24 C 3 31
25 C 4 41
26 C 4 44
27 C 4 42
I would like to process some correlation study for the data frame of each Experiment. The study I want to conduct is to calculate the correlation of 'SampleVol' with its Mean('Mass').
The groupby function can help me to get the mean of masses.
grp = df.groupby(['Experiment', 'SampleVol'])
grp.mean()
Out[17]:
Mass
Experiment SampleVol
A 1 11.500000
2 19.333333
3 28.500000
4 36.000000
B 1 11.500000
2 23.000000
3 31.500000
4 39.500000
C 1 7.500000
2 18.000000
3 30.000000
4 42.333333
I understand for each data frame I should use some numpy function to compute the correlation coefficient. But now, my question is how can I iterate the data frames for each Experiment.
Following is an example of the desired output.
Out[18]:
Experiment Slope Intercept
A 0.91 0.01
B 1.1 0.02
C 0.95 0.03
Thank you very much.