0

I am trying to add shape to a regression model. Here is the example:

library(ggpubr)
data(iris)
iris$ran <- as.factor(rep(c(1:2), each = 75))
fit <- lm(Sepal.Length ~ Petal.Width+Species+ran, data = iris)
ggplot(fit$model, aes_string(x = names(fit$model)[2], y = names(fit$model)[1], 
color=names(fit$model)[3], shape=names(fit$model)[4])) +
geom_point() +
geom_smooth(aes_string(fill = names(fit$model)[3], color = names(fit$model)[3]), 
method = "lm", col= "red", fullrange = TRUE) +
labs(x=expression(paste("Petal Width")),
     y=expression(paste("Sepal Length")),
     caption = paste("R2 =",signif(summary(fit)$r.squared, 2),
                     "\tAdj R2 =",signif(summary(fit)$adj.r.squared, 2),
                     "\tIntercept =",signif(fit$coef[[1]],2 ),
                     "\tSlope =",signif(fit$coef[[2]], 2),
                     "\tP =",signif(summary(fit)$coef[2,4], 2)))+
theme_classic2(base_size = 14)

I am getting a plot with four linear lines for each of the factor. I rather want linear regression lines only for "Species" but different shapes for "ran"(without adding regression lines for "ran" to the plot).

Also, I am also intending to change "R2" to R^2 which I am unable to do using current script and change the legend for ran as "Random" - "Factor1" and "Factor2".

Thank you in advance for your help.

Pedro J. Aphalo
  • 5,796
  • 1
  • 22
  • 23
AST
  • 57
  • 1
  • 6
  • You already got one regression line per Species and shape for ran, right? Or do you want e.g. one line per setosa-ran1 and another for setosa-ran2 ? – danlooo Sep 09 '21 at 11:56
  • Superscript in ggplot: https://stackoverflow.com/questions/37825558/how-to-use-superscript-with-ggplot2 – danlooo Sep 09 '21 at 11:57
  • @danlooo I want regression line for each "Species" only. I also to have different shapes for "ran" (no regression line), rather than just dots. Regarding superscript, none of the suggestions is dealing with using superscript when there is a value to be estimated before printing. – AST Sep 09 '21 at 13:48
  • @AST Be aware that the values for slope and intercept in the caption are only for setosa. The only function that you seem to be using from 'ggpubr' is `theme_classic2()` which you can replace by `theme_classic()` without affecting the plot and then use `library(ggplot2)` instead of `library(ggpubr)`. I added an alternative answer. – Pedro J. Aphalo Sep 10 '21 at 17:57

2 Answers2

2

You can use scale_shape_manual to modify the shape symbol according to this chart. Furthermore, you can use the unicode character ² directly to print the coefficient by copying it from here:

library(ggpubr)
#> Loading required package: ggplot2
library(tidyverse)
library(latex2exp)

iris$ran <- as.factor(rep(c(1:2), each = 75))
fit <- lm(Sepal.Length ~ Petal.Width + Species + ran, data = iris)

fit$model %>%
  ggplot(aes_string(
    x = names(fit$model)[2], y = names(fit$model)[1]
  )) +
  geom_point(aes_string(shape = names(fit$model)[4]), size = 2.5) +
  geom_smooth(aes_string(color = names(fit$model)[3]),
    method = "lm", fullrange = TRUE, se = FALSE
  ) +
  theme_classic2(base_size = 14) +
  scale_shape_manual(values = c(17, 18)) +
  labs(
    x = "Petal Width",
    y = "Sepal Length",
    caption = paste(
      "R² =", signif(summary(fit)$r.squared, 2),
      "\tAdj R²=", signif(summary(fit)$adj.r.squared, 2),
      "\tIntercept =", signif(fit$coef[[1]], 2),
      "\tSlope =", signif(fit$coef[[2]], 2),
      "\tP =", signif(summary(fit)$coef[2, 4], 2)
    )
  )
#> `geom_smooth()` using formula 'y ~ x'

Created on 2021-09-09 by the reprex package (v2.0.1)

To clarify the regression lines, I set se = FALSE in geom_smooth.

danlooo
  • 10,067
  • 2
  • 8
  • 22
  • Thanks @danlooo but this still doesn't solve my problem: There are four slope lines in the plot but I expect only three (for "Species" only). Also, can you please tell me how you wrote "R²" in the code, I am struggling to do it. – AST Sep 09 '21 at 18:08
  • @danlooo Thanks for your contribution! It gets the job done and it is nicely written. I added an alternative answer using package 'ggpmisc' of which I am the author. My answer shows how I would create such a plot. – Pedro J. Aphalo Sep 10 '21 at 17:50
1

This alternative answer is simpler, I think. It is possible to use fullrange = TRUE and se = FALSE and not to color the points also with this approach, but this yields a plot that badly misrepresents the data. Even if this does not produce the same caption, the code in my answer shows the results of each of the three fits automatically, and it would work unchanged with a different number of factor levels.

The iris data are being used as an example here, so that both widths and lengths are random variables can be ignored and OLS used. Otherwise major axis regression would be preferable, and the code below could be rewritten using stat_ma_line() and stat_ma_eq() and slightly adjusting the arguments passed to them.

library(ggpmisc)
#> Loading required package: ggpp
#> Loading required package: ggplot2
#> 
#> Attaching package: 'ggpp'
#> The following object is masked from 'package:ggplot2':
#> 
#>     annotate
iris$ran <- factor(rep(c(1:2), each = 75), labels = paste("Factor", 1:2))

ggplot(iris, aes(Petal.Width, Sepal.Length, colour = Species)) +
  geom_point(aes(shape = ran)) +
  stat_poly_line() + # se = FALSE can be added
  stat_poly_eq(aes(label = paste(after_stat(rr.label),
#                                 after_stat(adj.rr.label),
                                 after_stat(eq.label), 
                                 after_stat(p.value.label),
#                                 after_stat(n.label),
                                 sep = "*\", \"*"))) +
  labs(x = "Petal Width", y = "Sepal length", shape = "Random") +
  theme_classic(base_size = 14)

Created on 2021-09-10 by the reprex package (v2.0.1)

Pedro J. Aphalo
  • 5,796
  • 1
  • 22
  • 23