I am looking for a way to calculate the multiple correlation coefficient in R http://en.wikipedia.org/wiki/Multiple_correlation, is there a built-in function to calculate it ? I have one dependent variable and three independent ones. I am not able to find it online, any idea ?
-
2What do you mean by "program the formula"? Please read [this](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610#5963610) about asking questions in a way that makes it easy for people to help you. More resources [here.](http://stackoverflow.com/help/how-to-ask) – Bryan Hanson Apr 17 '15 at 02:25
-
I mean is there a build in function to calculate such a thing or you have to calculate it yourself. – user1594047 Apr 17 '15 at 02:30
3 Answers
The easiest way to calculate what you seem to be asking for when you refer to 'the multiple correlation coefficient' (i.e. the correlation between two or more independent variables on the one hand, and one dependent variable on the other) is to create a multiple linear regression (predicting the values of one variable treated as dependent from the values of two or more variables treated as independent) and then calculate the coefficient of correlation between the predicted and observed values of the dependent variable.
Here, for example, we create a linear model called mpg.model
, with mpg
as the dependent variable and wt
and cyl
as the independent variables, using the built-in mtcars
dataset:
> mpg.model <- lm(mpg ~ wt + cyl, data = mtcars)
Having created the above model, we correlate the observed values of mpg
(which are embedded in the object, within the model
data frame) with the predicted values for the same variable (also embedded):
> cor(mpg.model$model$mpg, mpg.model$fitted.values)
[1] 0.9111681
R will in fact do this calculation for you, but without telling you so, when you ask it to create the summary of a model (as in Brian's answer): the summary of an lm
object contains R-squared, which is the square of the coefficient of correlation between the predicted and observed values of the dependent variable. So an alternative way to get the same result is to extract R-squared from the summary.lm
object and take the square root of it, thus:
> sqrt(summary(mpg.model)$r.squared)
[1] 0.9111681
I feel that I should point out, however, that the term 'multiple correlation coefficient' is ambiguous.

- 1,503
- 4
- 20
- 29
-
This is too long. There are simpler inbuilt functions to implement what the OP has asked. – 89_Simple Jun 13 '19 at 10:20
-
-
5Okay, it's been nearly a year and you haven't answered that question. The truth is that there are NO inbuilt functions in R to implement what the OP has asked. – Westcroft_to_Apse Apr 30 '20 at 11:16
The built-in function lm
gives at least one version, not sure if this is what you are looking for:
fit <- lm(yield ~ N + P + K, data = npk)
summary(fit)
Gives:
Call:
lm(formula = yield ~ N + P + K, data = npk)
Residuals:
Min 1Q Median 3Q Max
-9.2667 -3.6542 0.7083 3.4792 9.3333
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 54.650 2.205 24.784 <2e-16 ***
N1 5.617 2.205 2.547 0.0192 *
P1 -1.183 2.205 -0.537 0.5974
K1 -3.983 2.205 -1.806 0.0859 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 5.401 on 20 degrees of freedom
Multiple R-squared: 0.3342, Adjusted R-squared: 0.2343
F-statistic: 3.346 on 3 and 20 DF, p-value: 0.0397
More info on what's going on at ?summary.lm
and ?lm
.

- 6,055
- 4
- 41
- 78
-
it would probably be the Multiple R-squared, this is the one I am looking for : http://en.wikipedia.org/wiki/Multiple_correlation – user1594047 Apr 17 '15 at 02:43
-
-
-
You can type `summary.lm` at the console and see the code it uses, then figure out what it does. Looks like `ans$r.squared <- mss/(mss + rss); ans$adj.r.squared <- 1 - (1 - ans$r.squared) * ((n - df.int)/rdf)` is the relevant part. `mss` is mean sum of squares, `rss` is residual sum of squares. Anything `df` is degrees of freedom. – Bryan Hanson Apr 17 '15 at 02:53
-
given I have three array x, y, z for my independent variable and one array t for my dependent one how the above solution should be used ? thanks for your time – user1594047 Apr 17 '15 at 13:42
-
Read the documentation for `lm` and perhaps look at your data structures with `str(x)` for example. – Bryan Hanson Apr 17 '15 at 14:28
Try this:
# load sample data
data(mtcars)
# calculate correlation coefficient between all variables in `mtcars` using
# the inbulit function
M <- cor(mtcars)
# M is a matrix of correlation coefficient which you can display just by
# running
print(M)
# If you want to plot the correlation coefficient
library(corrplot)
corrplot(M, method="number",type= "lower",insig = "blank", number.cex = 0.6)

- 3,393
- 3
- 39
- 94
-
'Try this' provides no explanation of what your code does. And what your code does is _not_ in fact what was asked for: rather than giving the multiple correlation coefficient, it plots a matrix of correlation coefficients between multiple pairs of variables. – Westcroft_to_Apse Jun 04 '19 at 18:53
-
Okay. I have edited the answer. Btw if OP had tried my solution, it is expected that OP would have run the `M` to see what is being plotted. I was avoiding any spoon-feeding of solution. – 89_Simple Jun 13 '19 at 10:17
-
This still does not do what the OP asked. What your code does is first to print and then to plot a matrix of correlation coefficients. What the OP asked for is the multiple correlation coefficient. My answer (which you have downvoted) provides that. – Westcroft_to_Apse Jun 13 '19 at 11:08