1

I have the following toy example of what I am trying to achieve:

library ("lattice")
library ("latticeExtra")
data (iris)
xyplot(Sepal.Width ~ Sepal.Length | Species, data = iris, panel = function(x, y, ...) { 
        panel.xyplot(x, y, ...)
        panel.lmlineq(x, y, adj = c(1,0), 
            lty = 1,col.text='red', pos= 4,
            col.line = "blue", digits = 1,r.squared =TRUE)
        panel.text(7, 4, round(cor(x, y),3), font=2, adj=c(0.5,-0.6))
        panel.text(7, 4, round(cor.test(x,y)$p.value, 3), font=1, adj=c(0.5,0.6))},
xlab = "Sepal.Length", ylab = "Sepal_Width")

enter image description here

So, as you can see, I have a data frame with levels (Species) that I'd like to plot (all at the same time) showing their regression line with the R-squared value and also printing their cor() and cor.test() outputs. Preferably, in an aesthetically pleasing manner.

Has anyone tried to do something similar? Is there an efficient way to do it?

Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
MEC
  • 71
  • 1
  • 10

1 Answers1

4

I do things like this using the tidyverse workflow, with help from the ggplot2 extension ggpmisc. There is a lot of room for customization, and you can minimize or streamline things if you wish.

library(tidyverse)
library(broom)
library(ggpmisc)

analysis <- iris %>%
  group_by(Species) %>%
  nest() %>%
  mutate(model = map(data, ~lm(Sepal.Length ~ Sepal.Width, data = .)),
    cor = map(data, ~tidy(cor.test(.x$Sepal.Length, .x$Sepal.Width), 3)))

stats <- analysis %>%
  unnest(cor)

ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width)) +
  geom_point(shape = 21) +
  geom_text(data = stats, aes(label = sprintf("r = %s", round(estimate, 3)), x = 7, y = 4)) +
  geom_text(data = stats, aes(label = sprintf("p = %s", round(p.value, 3)),  x = 7, y = 3.8)) +
  geom_smooth(method = "lm", formula = y ~ x) +
  stat_poly_eq(aes(label = paste(..eq.label.., ..rr.label.., sep = "~~~~")),
               formula = y ~ x,
               parse = TRUE) +
  facet_wrap(~Species)

enter image description here

Jake Kaupp
  • 7,892
  • 2
  • 26
  • 36
  • The code is clever. However, with my own data, I have a small problem. I'd like to ask whether is possible to include an if statement in the cor.test. There's a group among my observations, let's say it would be (virginica) that, in my data, has 2 observations only, so the cor.test returns = not enough finite observations. In your experience, would it be possible to prevent this from stopping the whole process? – MEC Jan 20 '17 at 16:34
  • You could write a wrapper around `cor.test` to do that. – Jake Kaupp Jan 20 '17 at 16:46
  • I am not very advanced in R. I've seen these possible answers http://stackoverflow.com/questions/23156856/ddply-cor-test-with-error-handling But I can't extract the correct values to be passed to the map group_cor2 <- function(DF, x , y) { check <- try(cor.test(DF[,x], DF[,y])$estimate, silent = TRUE) if(class(check) == "try-error") return(data.frame(cor = NA)) return(data.frame(cor = cor.test(DF[,x], DF[,y])$estimate)) return(data.frame(cor = cor.test(DF[,x], DF[,y])$p.value)) } – MEC Jan 20 '17 at 18:34
  • 1
    You could simply check if the rows of the data are greater than 2. If it's less, return a warning, if it's equal or greater, proceed to the cor.test. Don't store the output, and it will return the cor.test results. – Jake Kaupp Jan 20 '17 at 18:39