2

I would like to add the regression line to my correlation scatter plot. Unfortunately this doesn't really work with plot_ly(). I've already tried some solutions from other posts in this forum, but it doesn't work.

My data frame looks like the following (only a smart part of it):

data frame

My code for the plot and the actual plot-output look like the following:

CorrelationPlot <- plot_ly(data = df.dataCorrelation, x = ~df.dataCorrelation$prod1, 
                           y = ~df.dataCorrelation$prod2, type = 'scatter', mode = 'markers',
                           marker = list(size = 7, color = "#FF9999", line = list(color = "#CC0000", width = 2))) %>%
                    layout(title = "<b> Correlation Scatter Plot", xaxis = list(title = product1), 
                           yaxis = list(title = product2), showlegend = FALSE)

Correlation Scatter Plot Without Line

What I want to have is something like this:

Correlation Scatter Plot With Line

which I have produced with the ggscatter() function:

library(ggpubr)
  ggscatter(df.dataCorrelation, x = "prod1", y = "prod2", color = "#CC0000", shape = 21, size = 2,
            add = "reg.line", add.params = list(color = "#CC0000", size = 2), conf.int = TRUE, 
            cor.coef = TRUE, cor.method = "pearson", xlab = product1, ylab = product2)
                  

HOW do I get the regression line with plot_ly()??

CODE EDITING:

CorrelationPlot <- plot_ly(data = df.dataCorrelation, x = ~df.dataCorrelation$prod1, 
                           y = ~df.dataCorrelation$prod2, type = 'scatter', mode = 'markers',
                           marker = list(size = 7, color = "#FF9999",
                             line = list(color = "#CC0000", width = 2))) %>%
                   add_trace(x = ~df.dataCorrelation$fitted_values, mode = "lines", type = 'scatter',
                             line = list(color = "black")) %>%
                   layout(title = "<b> Correlation Scatter Plot", xaxis = list(title = product1), 
                           yaxis = list(title = product2), showlegend = FALSE)
  

GIVES:

dots

How do I get here a line for the regression line??

MikiK
  • 398
  • 6
  • 19
  • 1
    Please provide the `dt.dataCorrelation` in `dput()` format. Visit [How to make a great R reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). – UseR10085 Sep 04 '20 at 09:02

2 Answers2

3

I don't think there's a ready function like ggscatter, most likely you have to do it manually, like first fitting the linear model and adding the values to the data.frame.

I made a data.frame that's like your data:

set.seed(111)
df.dataCorrelation = data.frame(prod1=runif(50,20,60))
df.dataCorrelation$prod2 = df.dataCorrelation$prod1 + rnorm(50,10,5)

fit = lm(prod2 ~ prod1,data=df.dataCorrelation)
fitdata = data.frame(prod1=20:60)
prediction = predict(fit,fitdata,se.fit=TRUE)
fitdata$fitted = prediction$fit

The upper and lower bounds of the line are simply 1.96* standard error of prediction:

fitdata$ymin = fitdata$fitted - 1.96*prediction$se.fit
fitdata$ymax = fitdata$fitted + 1.96*prediction$se.fit

We calculate correlation:

COR = cor.test(df.dataCorrelation$prod1,df.dataCorrelation$prod2)[c("estimate","p.value")]
COR_text = paste(c("R=","p="),signif(as.numeric(COR,3),3),collapse=" ")

And put it into plotly:

library(plotly)

df.dataCorrelation %>%
plot_ly(x = ~prod1) %>%
add_markers(x=~prod1, y = ~prod2) %>%
add_trace(data=fitdata,x= ~prod1, y = ~fitted, 
mode = "lines",type="scatter",line=list(color="#8d93ab")) %>%
add_ribbons(data=fitdata, ymin = ~ ymin, ymax = ~ ymax,
line=list(color="#F1F3F8E6"),fillcolor ="#F1F3F880" ) %>%
layout(
    showlegend = F,
    annotations = list(x = 50, y = 50,
    text = COR_text,showarrow =FALSE)
)

enter image description here

StupidWolf
  • 45,075
  • 17
  • 40
  • 72
  • thanks for your quick reply. I have now oriented myself to your version and unfortunately I do not get what I would like. I have now edited my code above and get something completely wrong, and I don't know where my error is ??? – MikiK Sep 04 '20 at 09:50
  • the basic plot is as you can see above in my first graphic. So, the only things which are missing: ```add_trace()``` and ```add_ribbons()```.. I actually don't know why this doesn't work with my date frame – MikiK Sep 04 '20 at 10:11
  • If you look at the plot made there are twice as many points. I think the data.table is manipulated wrongly. Try converting the original DT to a data frame and run the code above – StupidWolf Sep 04 '20 at 10:20
  • thank you! The problem is in add_ribbons(), but I don't know why?! But without this it is working. The only minimal problem is, that the regression line is not a line, it is a line with dots as you can see above? – MikiK Sep 04 '20 at 11:00
  • If I use ```add_trace(y = ~df.dataCorrelation$fitted_values, ...)```, then I get the plot from the last time, where the values were plotted twice and the line is not a regression line, but connects all points in a zigzag. – MikiK Sep 07 '20 at 04:15
  • 1
    Ok @Michi then you have duplicated values, thats why it goes zig zag. I have updated the code. you need to make a range for the fitted values.. – StupidWolf Sep 07 '20 at 07:36
  • To save all this back and forth I would suggest providing the data up front by using dput(), see https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – StupidWolf Sep 07 '20 at 07:37
0

Another option is using ggplotly as

library(plotly)
ggplotly(
ggplot(iris, aes(x = Sepal.Length, y = Petal.Length))+
  geom_point(color = "#CC0000", shape = 21, size = 2) +
  geom_smooth(method = 'lm') +
  annotate("text", label=paste0("R = ", round(with(iris, cor.test(Sepal.Length, Petal.Length))$estimate, 2),
                                ", p = ", with(iris, cor.test(Sepal.Length, Petal.Length))$p.value), 
x = min(iris$Sepal.Length) + 1, y = max(iris$Petal.Length) + 1, color="steelblue", size=5)+
  theme_classic()
)

enter image description here

UseR10085
  • 7,120
  • 3
  • 24
  • 54
  • thank you ! Unfortunately, I have to use ```plotly()```... But your version seems quite easier. – MikiK Sep 04 '20 at 09:51