0

I would like to do a corrplot, but instead of using the correlation coefficient, it would display the slope of a linear regression between each variables.

And if possible, it would do the same than the corrplot function, as it will show which slope is significant or not. And for comparaison issues between the variables, I guess it would be preferable to normalise all the slopes.

I want to do that because I have sometimes a bad correlation/R2, but still a significant slope. So having both the correlation matrix and the "slope" matrix would be great.

Do you know if there is any existing function like this ? Or how to do it ? Thank you.

EDIT : Here is a link explaining why I have a difference between the slope and R2/correlation : https://statisticsbyjim.com/regression/low-r-squared-regression/

Here is an example of what I get using corrplot. And what I would like to do is a similar function but with the slope instead of the correlation.

M<-cor(mtcars) 
test <- cor.mtest(M, conf.level = 0.95)
corrplot(M, order="hclust", tl.col="black",
         p.mat = test$p, sig.level = 0.10)
Mimosa
  • 47
  • 4
  • It's easier to help you if you provide a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. – MrFlick Aug 23 '22 at 13:44
  • 1
    I don't think there's an existing function to do it, but you could certainly write one. Though it does seem a little strange, correlation is symmetric (`cor(x, y)` is the same as `cor(y, x)`), but the assumptions of linear models make the response meaningfully different from the predictor(s). So theoretically it seems a bit strange (and that strangeness is probably related to why there's not a built-in function to do it). What are you hoping to do with this that a standard corrplot isn't doing? – Gregor Thomas Aug 23 '22 at 13:48
  • 2
    [This question and answer](https://stackoverflow.com/a/51953714/903061) will get you 80% of the way there. The asker/answerer did a lot of work to make it nicely efficient. – Gregor Thomas Aug 23 '22 at 13:52

2 Answers2

0

Here you have points with best fit (lower panel), and the regression parameters( upper panel):

#Panel of correlations
panel.corr <- function(x, y,data){
  usr <- par("usr"); on.exit(par(usr))
  par(usr = c(0, 1, 0, 1))
  a <- round(summary(lm(x~y, data=mtcars))$coef[1,1],3)
  b <- round(summary(lm(x~y, data=mtcars))$coef[2,1],3)
  txt <- paste0("y=", a," + (",b,")*x")
  text(0.5, 0.5, txt, cex = 1)
}

#Panel of histograms
panel.hist <- function(x, ...){
  usr <- par("usr"); on.exit(par(usr))
  par(usr = c(usr[1:2], 0, 1.5) )
  h <- hist(x, plot = FALSE)
  breaks <- h$breaks
  len <- length(breaks)
  y <- h$counts/max(h$counts)
  rect(breaks[-len], 0, breaks[-1], y, col = "lightblue")
}

panel.scat <- function(x, y, ...) {
  usr <- par("usr"); on.exit(par(usr))
 par(usr = c(0, 1, 0, 1), new = TRUE)
 plot(x,y)
abline(lm(y ~ x))


}

#Plot
pairs(mtcars[, c(1,3:7)],  
      lower.panel = panel.scat,
      upper.panel = panel.corr,
      diag.panel = panel.hist,
      gap = 0.3, 
      main = "Scatterplot matrix of `mtcars`")
Greg3er
  • 27
  • 6
-1

Following the tutorial on this page and to answer your question :

library(tidyverse)
library(ggpubr)
theme_set(theme_pubr())

# Load the package
data("marketing", package = "datarium")
head(marketing, 4)

ggplot(marketing, aes(x = youtube, y = sales)) +
  geom_point() +
  stat_smooth()

cor(marketing$sales, marketing$youtube)

model <- lm(sales ~ youtube, data = marketing)
model

The output of calling model is :

## 
## Call:
## lm(formula = sales ~ youtube, data = marketing)
## 
## Coefficients:
## (Intercept)      youtube  
##      8.4391       0.0475

And there is the informations you're looking for :

  • Intercepts is quite self-explanatory
  • Slope is the value of youtube coefficient here
  • If you are working on multiple regression you need to take into account all the coeeficient from your model or formula to link the R code.

If you want to compare just to features which you previously computed the correlation just swap it into the formula and you'll get a simple regression model for it. I have to advise you to check the pre-requisite of a linear regression before just in case ...

Hope it helps.

bvittrant
  • 79
  • 6
  • 1
    I think the hard part about this question is that they want to do it for every pair of variables, like a cor plot. – Gregor Thomas Aug 23 '22 at 13:56
  • Don't heard of any function doing that. But there should be a way to create a function doing that with the piece of code provided by the tutorial ^^ – bvittrant Aug 23 '22 at 13:59