4

How can I control the numeric display in Regression equation by using round or sprintf function? I also could not figure out how to use dev="tikz" when using eq.with.lhs = "hat(Y)~=~".

library(ggplot2)
library(ggpmisc)

# generate artificial data
set.seed(4321)
x <- 1:100
y <- (x + x^2 + x^3) + rnorm(length(x), mean = 0, sd = mean(x^3) / 4)
my.data <- data.frame(x, 
                      y, 
                      group = c("A", "B"), 
                      y2 = y * c(0.5,2),
                      block = c("a", "a", "b", "b"))

str(my.data)

# plot
ggplot(data = my.data, mapping=aes(x = x, y = y2, colour = group)) +
        geom_point() +
        geom_smooth(method = "lm", se =  FALSE, formula = y ~ poly(x=x, degree = 2, raw = TRUE)) +
        stat_poly_eq(
                       mapping     = aes(label = paste(..eq.label.., ..rr.label.., sep = "~~~"))
                     , data        = NULL
                     , geom        = "text"
                     , formula     = y ~ poly(x, 2, raw = TRUE)
                     , eq.with.lhs = "hat(Y)~`=`~"
                     , eq.x.rhs    = "X"
                     , label.x     = 0
                     , label.y     = 2e6
                     , vjust       = c(1.2, 0)
                     , position    = "identity"
                     , na.rm       = FALSE
                     , show.legend = FALSE
                     , inherit.aes = TRUE
                     , parse       = TRUE
                     ) +
        theme_bw()

enter image description here

SlowLoris
  • 995
  • 6
  • 28
MYaseen208
  • 22,666
  • 37
  • 165
  • 309
  • There is an important difference between round and sprintf. First rounds the value due to mathematical rules and the second just cut the number due to the specified structure. I prefer sprintf, because it is a method to print values. – FlorianSchunke May 16 '16 at 04:29
  • What do you mean with rounding? Do you mean the physical rounding of the coefficients (e.g. 67.5 to 68) or adapting the function so that it is closer to the inserted function (minus noise). The former would be a programming question while the latter is more mathematical in nature. It would also be more clear to only ask one question per question (it would be easy to create separate a minimal working example for the question about the dev=tikz). Otherwise you can have the situation that you want to accept two answers since they each answer only one part. – takje May 22 '16 at 18:32
  • 1
    The point about two questions in one is a very good one. Anyway, my answer now answers both. `round`and `signif` return numeric values, `sprintf` returns a character value. Depending on the format specification, `sprintf`will use the equivalent of `round` or `signif` to convert a number. – Pedro J. Aphalo May 23 '16 at 05:40

2 Answers2

2

1) The code below answers the dev="tikz" part of the question if used with the 'ggpmisc' (version >= 0.2.9)

\documentclass{article}

\begin{document}

<<setup, include=FALSE, cache=FALSE>>=
library(knitr)
opts_chunk$set(fig.path = 'figure/pos-', fig.align = 'center', fig.show = 'hold',
               fig.width = 7, fig.height = 6, size = "footnotesize", dev="tikz")
@


<<>>=
library(ggplot2)
library(ggpmisc)
@

<<>>=
# generate artificial data
set.seed(4321)
x <- 1:100
y <- (x + x^2 + x^3) + rnorm(length(x), mean = 0, sd = mean(x^3) / 4)
my.data <- data.frame(x,
                      y,
                      group = c("A", "B"),
                      y2 = y * c(0.5,2),
                      block = c("a", "a", "b", "b"))

str(my.data)
@

<<>>=
# plot
ggplot(data = my.data, mapping=aes(x = x, y = y2, colour = group)) +
  geom_point() +
  geom_smooth(method = "lm", se =  FALSE, 
              formula = y ~ poly(x=x, degree = 2, raw = TRUE)) +
  stat_poly_eq(
    mapping     = aes(label = paste("$", ..eq.label.., "$\\ \\ \\ $",
                       ..rr.label.., "$", sep = ""))
    , geom        = "text"
    , formula     = y ~ poly(x, 2, raw = TRUE)
    , eq.with.lhs = "\\hat{Y} = "
    , output.type = "LaTeX"
   ) +
  theme_bw()
@

\end{document}

enter image description here

Thanks for suggesting this enhancement, I will surely also find a use for it myself!

2) Answer to the roundand sprintf part of the question. You cannot use round or sprintf to change the number of digits, stat_poly_eq currently uses signif with three significant digits as argument applied to the whole vector of coefficients. If you want full control then you could use another statistics, stat_fit_glance, that is also in ggpmisc (>= 0.2.8), which uses broom:glance internally. It is much more flexible, but you will have to take care of all the formating by yourself within the call to aes. At the moment there is one catch, broom::glance does not seem to work correctly with poly, you will need to explicitly write the polynomial equation to pass as argument to formula.

Pedro J. Aphalo
  • 5,796
  • 1
  • 22
  • 23
  • Please [see](http://stackoverflow.com/q/38686029/707145) a relevant question [here](http://stackoverflow.com/q/38686029/707145). Thanks – MYaseen208 Aug 05 '16 at 20:41
1

Myaseen208,

Here is a workaround for the problem with creating .tex output with ggpmisc::stat_poly_eq(). I was able to confirm that you cannot currently combine stat_poly_eq(), and "hat(Y)~=~" with the library(tikzDevice) to create latex .tex output. I have, however, provided a solution to create the correct .tex output in the interim.

Pedro Aphalo the creator of the ggpmiscpackage has very kindly accepted the enhancement request for ggpmisc::stat_poly_eq(). Per the requested bug report filed and referenced below.

Code Example:

The following code will produce a graphic without a hat symbol:

# Load required packages
requiredPackages <- requiredPackages <- c("ggplot2", "ggpmisc", "tikzDevice", "latex2exp")

# ipak - Check to see if the package is installed, if not install and then load...
ipak <- function(pkg)
{
  new.pkg <- pkg[!(pkg %in% installed.packages()[, "Package"])]
  if (length(new.pkg))
    install.packages(new.pkg, dependencies = TRUE)
  sapply(pkg, require, character.only = TRUE)
}

ipak(requiredPackages)

# generate artificial data
set.seed(4321)
x <- 1:100
y <- (x + x ^ 2 + x ^ 3) + rnorm(length(x), mean = 0, sd = mean(x ^ 3) / 4)
my.data <- data.frame(
  x, y,
  group = c("A", "B"),
  y2 = y * c(0.5, 2),
  block = c("a", "a", "b", "b")
)

# Define Formaula..
formulaDefined <- (y ~ (poly(x = x, degree = 2, raw = TRUE)))

gp <- ggplot(data = my.data, mapping = aes(x = x, y = y2, colour = group))
gp <- gp + geom_point()
gp <- gp + geom_smooth(method = "lm", se =  FALSE, formula = formulaDefined )
gp <- gp + stat_poly_eq(
  aes(label = paste(..eq.label.., "~~~", ..rr.label.., sep = "")),
#  eq.with.lhs = "italic(hat(y))~`=`~",
  formula     = formulaDefined,
  geom        = "text",
  label.x     = 0,
  label.y     = 2e6,
  vjust       = c(1.2, 0),
  position    = "identity",
  na.rm       = FALSE,
  show.legend = FALSE,
  inherit.aes = TRUE,
  parse       = TRUE)
gp <- gp + theme_bw()
gp

enter image description here

we can now modify this code and its tikz output to create the desired result:

Tikz Code Solution

The first step is to modify the code to output the required .tex file. With this done, we can then harness gsub() to find the required lines in the .tex file and replace the {\itshape y}; with {\^{y}}; [Lines 646 and 693].

# Load required packages
requiredPackages <- requiredPackages <- c("ggplot2", "ggpmisc", "tikzDevice", "latex2exp")

# ipak - Check to see if the package is installed, if not install and then load...
ipak <- function(pkg)
{
  new.pkg <- pkg[!(pkg %in% installed.packages()[, "Package"])]
  if (length(new.pkg))
    install.packages(new.pkg, dependencies = TRUE)
  sapply(pkg, require, character.only = TRUE)
}

ipak(requiredPackages)

# generate artificial data
set.seed(4321)
x <- 1:100
y <- (x + x ^ 2 + x ^ 3) + rnorm(length(x), mean = 0, sd = mean(x ^ 3) / 4)
my.data <- data.frame(
  x, y,
  group = c("A", "B"),
  y2 = y * c(0.5, 2),
  block = c("a", "a", "b", "b")
)

setwd("~/dev/stackoverflow/37242863")

texFile <- "./test2.tex"
# setup tex output file
tikz(file = texFile, width = 5.5, height = 5.5)

#Define Formaula..
formulaDefined <- (y ~ (poly(x = x, degree = 2, raw = TRUE)))

gp <- ggplot(data = my.data, mapping = aes(x = x, y = y2, colour = group))
gp <- gp + geom_point()
gp <- gp + geom_smooth(method = "lm", se =  FALSE, formula = formulaDefined )
gp <- gp + stat_poly_eq(
  aes(label = paste(..eq.label.., "~~~", ..rr.label.., sep = "")),
#  eq.with.lhs = "italic(hat(y))~`=`~",
  formula     = formulaDefined,
  geom        = "text",
  label.x     = 0,
  label.y     = 2e6,
  vjust       = c(1.2, 0),
  position    = "identity",
  na.rm       = FALSE,
  show.legend = FALSE,
  inherit.aes = TRUE,
  parse       = TRUE)
gp <- gp + theme_bw()
gp
dev.off()

## OK, now we can take the test.txt file and replace the relevant attributes to
## add the hat back to the y in the .tex output file...

texOutputFile <- readLines(texFile)
y <- gsub('itshape y', '^{y}', texOutputFile )
cat(y, file=texFile, sep="\n")

Tex Test Framework:

To test the solution, we can create a small latex test harness. You can load this file in RStudio [t1.tex] and then compile it; it will pull in test2.text, generated via the code previously presented.

nb. RStudio is a great platform for compiled latex output from R.

\documentclass{article}

\usepackage{tikz}

\begin{document}

\begin{figure}[ht]
\input{test2.tex}
\caption{Sample output from tikzDevice 2}
\end{figure}

\end{document}

result:

enter image description here

Alternate Solution

Another option might be to use geom_text(), the downside of this approach is that you have to write a regression line equation function yourself. This was discussed in your previous post: Adding Regression Line Equation and R2 on graph

If you need a detailed solution [with geom_text] then ping me. The other option is to file a bug report with ggpmisc [done by me] and see if the author has addressed already or can address.

Bug Report: https://bitbucket.org/aphalo/ggpmisc/issues/1/stat_poly_eq-fails-when-used-with

I hope the above helps.

Community
  • 1
  • 1
Technophobe01
  • 8,212
  • 3
  • 32
  • 59
  • @pedro-aphalo Is there a better way to this with ggpmisc? – Technophobe01 May 22 '16 at 18:53
  • 2
    I will investigate this problem for the next release. I assume one would need to output valid LaTeX code instead of an R expression. It will be a couple weeks as yesterday I submitted ggpmisc 0.2.8 to CRAN and it was accepted this morning. I have experience with LaTeX so it is easily doable. – Pedro J. Aphalo May 22 '16 at 19:27
  • 1
    Pedro, thanks for the quick response. Your help is much appreciated. Take care. The solution as it stands should allow @MYaseen208 to proceed in the interim. – Technophobe01 May 22 '16 at 19:31