2

I am trying to modify how stat_regline_equation displays the regression line equation on a plot made with ggscatter from the R package ggpubr. Specifically, I want to show a consistent number of digits of coefficients, even when some rounded coefficients have trailing zeros, which are typically removed. Here is an example:

library(tidyverse)
library(ggpubr)

diamonds %>%
  filter(color %in% c("E", "H", "I")) %>%
  ggscatter(x="carat", y="table", add="reg.line") +
    facet_wrap(~color) +
    stat_regline_equation(label.y.npc = 'top')

enter image description here

Graph I is fine, graph H has one trailing zero removed, and graph E has the slope removed entirely because it rounds to 1.00. Based on a great answer I got here as well as a different answer here, I tried to modify the package code using trace(ggpubr:::.stat_lm, edit = TRUE) to modify lines 13 and 14 from

eq.char <- as.character(signif(polynom::as.polynomial(coefs), 2))

to

eq.char <- as.character(formatC(polynom::as.polynomial(coefs), format = "f", digits = 2))

Here is the problem: if you pass a polynom::polynomial object to signif or round, they return another polynom::polynomial object, but for formatC or sprintf they return characters:

coefs = diamonds %>%
  filter(color=='E') %>%
  stats::lm(table~carat, .) %>%
  stats::coef()

coefs %>%
  polynom::as.polynomial() %>%
  formatC(format='f', digits=2) %>%
  class() %>%
  print()

coefs %>%
  polynom::as.polynomial() %>%
  signif(digits = 2) %>%
  class() %>%
  print()

[1] "character"
[1] "polynomial"

Therefore my attempt to use formatC above doesn't work. I am guessing that the polynom::polynomial class has built-in methods for round and signif, and none for formatC, so the output is coerced for the latter. I could potentially try to modify the class definition of polynom::polynomial, but at this stage I feel like there has to be an easier way to get trailing zeros on the regression equations that display on my graphs. And I am hoping that this is a common enough desire that someone has an easier solution, or at the very least that an answer might be useful to more people besides myself.

Pedro J. Aphalo
  • 5,796
  • 1
  • 22
  • 23
Carl
  • 83
  • 1
  • 7

2 Answers2

3

EDIT: This answer only partly fixes the problem. It still displays only 56.83 + 1 x instead of 1.00 x. I'm leaving the answer since someone else may be able to build from this.

A big part of the problem is polynom:::print.polynomial, which contains:

p <- as.character.polynomial(signif(x, digits = digits), decreasing = decreasing)

This will never print trailing zeroes due to as.character.polynomial. So, we can just create a new as.character.polynomial that DOES allow that. I just modified the existing code a bit as an example, and you can tweak it further:

as.character.polynomial <- function (x, decreasing = FALSE, digits = 2, nsmall = 2) {
  p <- format(unclass(x), digits = digits, nsmall = nsmall)
  lp <- length(p) - 1
  names(p) <- 0:lp
  p <- p[as.numeric(p) != 0]
  if (length(p) == 0) 
    return("0")
  if (decreasing) 
    p <- rev(p)
  signs <- ifelse(as.numeric(p) < 0, "- ", "+")
  signs[1] <- if (signs[1] == "- ") "-" else ""
  np <- names(p)
  pow <- paste("x^", np, sep = "")
  pow[np == "0"] <- ""
  pow[np == "1"] <- "x"
  stars <- rep.int("*", length(p))
  stars[p == "" | pow == ""] <- ""
  paste0(signs, p, stars, pow, collapse = " ")
}

Example:

coefs %>%
  polynom::as.polynomial() %>%
  as.character.polynomial
# [1] "56.83 + 1.00*x

However, .stat_lm will then output it as italic(y)~`=`~56.83 + 1.00*~italic(x), and it will consequently be used as an expression. I'm not familiar enough with ggplot2 to figure out the rest, so I'll leave that to someone else.

enter image description here

slamballais
  • 3,161
  • 3
  • 18
  • 29
  • 1
    This will not necessarily work if the character string is converted into an R expression, because `expression()` itself drops trailing zeros. `expression(56.83 + 1.00*x)` returns `expression(56.83 + 1 * x)`. – Pedro J. Aphalo Jun 13 '21 at 21:05
  • 1
    @PedroAphalo Yep, you're right. When actually implementing it in the whole call it only gets to "1 * x". I'll leave the answer so someone else can build upon it, and I added a message at the top. – slamballais Jun 14 '21 at 06:31
  • This is a great start, and I really appreciate you taking the time @slamballais. Expression parsing has always confounded me. I guess now is my chance to figure it out. – Carl Jun 14 '21 at 15:36
  • @slamballais Are you o.k. if I edit 'ggpmisc' to define and use an edited version of your `as.character.polynomial()` function? I can add you as contributor is you send me your name (privately) as you would like it mentioned in DESCRIPTION. – Pedro J. Aphalo Jun 21 '21 at 16:25
  • Hey @PedroAphalo, feel free to just use it without my name :) – slamballais Jun 21 '21 at 19:25
  • Many thanks @slamballais! I ended using `sprintf()` rather than `format()`, with a call: `sprintf("%.*#g", digits, x)` – Pedro J. Aphalo Jun 26 '21 at 07:08
3

As one problem is expression() we get closer to the desired output using package 'ggtext' and the equations formatted as markdown. Package 'ggpmisc' follows the grammar of graphics so there is more to type than with 'ggpubr' but it retains all the flexibility of 'ggplot2' and the concept of layers. It formats equations as R expressions by default but it also can return LaTeX and markdown formatted equations. It uses signif() internally so the number of digits after the decimal point can vary. The number of significant digits can be controlled through parameter coef.digits.

The values retain trailing zeros based on the number of significant digits rather than the number of digits after the decimal point as small coefficients for high order terms of a polynomial are important.

I prefer theme_bw() to theme_classic() for plots with panels, theme_classic() would give a plot formatted almost as in the question.

[code updated for R (>= 4.2.0) and 'ggpmisc' (>= 0.4.5).]

library(ggpmisc)
#> Loading required package: ggpp
#> Loading required package: ggplot2
#> 
#> Attaching package: 'ggpp'
#> The following object is masked from 'package:ggplot2':
#> 
#>     annotate
library(ggtext)

diamonds |>
  subset(color %in% c("E", "H", "I")) |>
  ggplot(aes(x=carat, y=table)) +
  geom_point() +
  stat_poly_line() +
  stat_poly_eq(aes(label = after_stat(eq.label)),
               geom = "rich_text", output.type = "markdown",
               label.y = 72, label.x = 1, fill = NA, label.size = NA,
               hjust = 0) +
  facet_wrap(~color) +
  theme_bw()

Created on 2022-06-03 by the reprex package (v2.0.1)

Note: The statistic stat_poly_eq() in package 'ggpmisc' is the original piece of code which was copied without acknowledgement and renamed as stat_regline_equation() in 'ggpubr'. Meanwhile, development of package 'ggpmisc' has continued and currently stat_poly_eq() has several new features and bug fixes. One of the features added soon after package 'ggtext' made it to CRAN is the support for markdown encoded equations, which I used in the example above.

Pedro J. Aphalo
  • 5,796
  • 1
  • 22
  • 23
  • This is really interesting, thank you for taking the time. This looks something I could work with. However, I'm having issues reproducing the example above. I'm getting the error `Error in stat_poly_line() : could not find function "stat_poly_line"`. Really sorry if this is a mistake on my part. I will try to work it out one way or another. – Carl Jun 21 '21 at 16:09
  • looking at examples of the use of 'ggpmisc', I was able to successfully substitute `geom_smooth()` for `stat_poly_line()`. Unfortunately I can't get the slope for Graph H to display as '0.80', no matter what I set `coef.digits` to. The frustrating part is that graph E has correct trailing zeros, as yours does above. Again, I will continue to work on it, but any hints or thoughts would be appreciated. – Carl Jun 21 '21 at 16:47
  • 1
    It will take me a few days before I have time to look into this, but it is in my to do list for 'ggpmisc'. The question is that in many cases what matters is the number of significant digits, not places behind the decimal point. The 1.00 x works because the code that I wrote in 'ggpmisc' treats it as a special case. I think the formatting should be so that the number of places after the leftmost significant digit remains the same in all cases. – Pedro J. Aphalo Jun 22 '21 at 22:02
  • I really appreciate you taking any time at all on this. People like me owe a big debt of gratitude to people like you who are willing to volunteer your time to develop packages. I find myself wondering how big a deal this is when a figure is reviewed for publication. As a grad student I can only guess, and try to cover my bases. – Carl Jun 25 '21 at 18:15
  • 1
    @Carl I am myself a user of 'ggpmisc' so I see correcting bugs and adding enhancements like time well invested. Questions like yours help improve a package, in this case for a feature that I expect to use myself frequently. The answer from @slamballais put me in a good rack to solving this issue for output other than `expression`. The best way to thank is to cite the package in your manuscript, if the journal allows it. I see answering questions and publicly releasing my packages also as a way of paying back for the help I have received online over many years. – Pedro J. Aphalo Jun 26 '21 at 07:26
  • I see, thank you for clarifying! I will cite if at all possible. – Carl Jun 29 '21 at 18:16