Pearson correlation coefficient R and R-squared coefficient of determination are two completely different statistics.
You can take a look at
https://en.wikipedia.org/wiki/Pearson_correlation_coefficient
and
https://en.wikipedia.org/wiki/Coefficient_of_determination
update
Persons's r coefficient is a measure of linear correlation between two variables and is

where bar x
and bar y
are the means of the samples.
R2 coefficient of determination is a measure of goodness of fit and is

where hat y
is the predicted value of y
and bar y
is the mean of the sample.
Thus
- they measure different things
r**2
is not equal to R2
because their formula are totally different
update 2
r**2
is equal to R2
only in the case that you calculate r
with a variable (say y
) and the predicted variable hat y
from a linear model
Let's make an example using the two arrays you provided
import numpy as np
import pandas as pd
import scipy.stats as sps
import statsmodels.api as sm
from sklearn.metrics import r2_score as R2
import matplotlib.pyplot as plt
a = np.array([32.0, 25.97, 26.78, 35.85, 30.17, 29.87, 30.45, 31.93, 30.65, 35.49,
28.3, 35.24, 35.98, 38.84, 27.97, 26.98, 25.98, 34.53, 40.39, 36.3])
b = np.array([28.778585, 31.164268, 24.690865, 33.523693, 29.272448, 28.39742,
28.950092, 29.701189, 29.179174, 30.94298 , 26.05434 , 31.793175,
30.382706, 32.135723, 28.018875, 25.659306, 27.232124, 28.295502,
33.081223, 30.312504])
df = pd.DataFrame({
'x': a,
'y': b,
})
df.plot(x='x', y='y', marker='.', ls='none', legend=False);

now we fit a linear regression model
mod = sm.OLS.from_formula('y ~ x', data=df)
mod_fit = mod.fit()
print(mod_fit.summary())
output
OLS Regression Results
==============================================================================
Dep. Variable: y R-squared: 0.580
Model: OLS Adj. R-squared: 0.557
Method: Least Squares F-statistic: 24.88
Date: Mon, 29 Mar 2021 Prob (F-statistic): 9.53e-05
Time: 14:12:15 Log-Likelihood: -36.562
No. Observations: 20 AIC: 77.12
Df Residuals: 18 BIC: 79.12
Df Model: 1
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
Intercept 16.0814 2.689 5.979 0.000 10.431 21.732
x 0.4157 0.083 4.988 0.000 0.241 0.591
==============================================================================
Omnibus: 6.882 Durbin-Watson: 3.001
Prob(Omnibus): 0.032 Jarque-Bera (JB): 4.363
Skew: 0.872 Prob(JB): 0.113
Kurtosis: 4.481 Cond. No. 245.
==============================================================================
and compute both r**2
and R2
and we can see that in this case they're equal
predicted_y = mod_fit.predict(df.x)
print("R2 :", R2(df.y, predicted_y))
print("r^2:", sps.pearsonr(df.y, predicted_y)[0]**2)
output
R2 : 0.5801984323799696
r^2: 0.5801984323799696
You did R2(df.x, df.y)
that can't be equal to our computed values because you used a measure of goodness-of-fit between independent x
and dependent y
variables. We instead used both r
and R2
with y
and predicted value of y
.