0

I'm attempting to have different pvalues and r^2's show up on a plot I'm making using ggplot. My plot needs to be faceted, as I have many different factors of data I'm working on. The graphs I'm trying to make should all be linear models, but I'd like each to have it's own pvalue and r^2 show up in it's respective space.

I've been trying to use sapply to split the dataframe up and then calculate the r^2's and pvalues and then plug them back into the plot using geom_text(label = examplefunction), but I keep receiving the error "Error: Aesthetics must be either length 1 or the same as the data (244): x, y, label, hjust, vjust".

Here's an example using the "tips" dataframe from the reshape package:

library(reshape)

lm_equation <- function(tips){
  sapply(split(tips, list(tips$sex, tips$day)), function(tips){
    m <- lm(tips$tip ~ tips$total_bill, tips);
    eq <- substitute(~~italic(r)^2~"="~rvalue*","~italic(p)~"="~pvalue, 
                     list(rvalue = sprintf("%.2f",sign(coef(m)[2])*sqrt(summary(m)$r.squared)),
                          pvalue = format(summary(m)$coefficients[2,4], digits = 2)))
    as.character(as.expression(eq));
  })
}

scat <- ggplot(tips, aes(tip, total_bill))
scat +
  geom_point(size = 5, alpha = 0.9)+
  labs(x = "tip", y = "bill total")+
  geom_smooth(method=lm, colour = "#000000", se = F)+
  facet_grid(sex~day, scales = "free")+
  geom_text(x = min(tips$tip), y = max(tips$total_bill-10), label = lm_equation(tips), parse = T, vjust = "inward", hjust = "inward")+
  theme_classic() + 
  theme(text = element_text(size = 15))

What's frustrating is the code works if I take out the split, but then the pvalues and r^2s are meaningless since they are taken from the entire dataframe rather than just that specific faceted group.

Example of working code:

lm_equation2 <- function(tips){
    m <- lm(tips$tip ~ tips$total_bill, tips);
    eq <- substitute(~~italic(r)^2~"="~rvalue*","~italic(p)~"="~pvalue, 
                     list(rvalue = sprintf("%.2f",sign(coef(m)[2])*sqrt(summary(m)$r.squared)),
                          pvalue = format(summary(m)$coefficients[2,4], digits = 2)))
    as.character(as.expression(eq));
  }

scat2 <- ggplot(tips, aes(tip, total_bill))
scat2 +
  geom_point(size = 5, alpha = 0.9)+
  labs(x = "tip", y = "bill total")+
  geom_smooth(method=lm, colour = "#000000", se = F)+
  facet_grid(sex~day, scales = "free")+
  geom_text(x = min(tips$tip), y = max(tips$total_bill-10), label = lm_equation2(tips), parse = T, vjust = "inward", hjust = "inward")+
  theme_classic() + 
  theme(text = element_text(size = 15))

What am I missing here? Do I need to resort to subsetting my data?

Bizzy
  • 15
  • 2
  • You end up with eight labels (one for each facet) but the error is due to the dataset you are using for plotting being much larger. Instead of making a vector of labels only, try making a dataset that contains these labels in a column with other columns for the factors that make up the facets, and the x and y variables for placement of the text. – aosmith Jul 20 '18 at 21:03
  • [This answer](https://stackoverflow.com/a/17024184/2461552) is doing something like my suggestion. – aosmith Jul 20 '18 at 21:05
  • @aosmith I can't seem to get that answer to fit in with what I'm doing... I'm still new to the splitting thing and I keep getting errors that my arguments need to have the same length. Any thoughts? – Bizzy Jul 30 '18 at 14:11

1 Answers1

0

Here is an example of taking what you have and organizing the results into a data.frame that contains all the variables you need for plotting. In particular the faceting variables must be present in the dataset.

First you can put the labels and names of each group (combo of sex and day) into a data.frame as columns. You will want to add a columns for the position each of each equation, using the names of the original x and y variables.

lab_dat = data.frame(group = names(lm_equation(tips)),
                     tip = min(tips$tip),
                     total_bill = max(tips$total_bill - 10),
                     label = lm_equation(tips))
lab_dat

                  group tip total_bill                                                            label
Female.Fri   Female.Fri   1      40.81   ~~italic(r)^2 ~ "=" ~ "0.72" * "," ~ italic(p) ~ "=" ~ "0.029"
Male.Fri       Male.Fri   1      40.81 ~~italic(r)^2 ~ "=" ~ "0.92" * "," ~ italic(p) ~ "=" ~ "0.00017"
Female.Sat   Female.Sat   1      40.81  ~~italic(r)^2 ~ "=" ~ "0.50" * "," ~ italic(p) ~ "=" ~ "0.0071"
Male.Sat       Male.Sat   1      40.81 ~~italic(r)^2 ~ "=" ~ "0.77" * "," ~ italic(p) ~ "=" ~ "1.4e-12"
Female.Sun   Female.Sun   1      40.81 ~~italic(r)^2 ~ "=" ~ "0.74" * "," ~ italic(p) ~ "=" ~ "0.00041"
Male.Sun       Male.Sun   1      40.81 ~~italic(r)^2 ~ "=" ~ "0.46" * "," ~ italic(p) ~ "=" ~ "0.00032"
Female.Thur Female.Thur   1      40.81 ~~italic(r)^2 ~ "=" ~ "0.87" * "," ~ italic(p) ~ "=" ~ "9.4e-11"
Male.Thur     Male.Thur   1      40.81   ~~italic(r)^2 ~ "=" ~ "0.76" * "," ~ italic(p) ~ "=" ~ "1e-06"

Then you need to take the group variable, which has combined sex and day, and split it back into two separate variables. I use separate() from package tidyr for this. The new variables should be named the same as the variables in the original dataset, since these are the faceting variables and need to be present in a dataset used for any plotting layer.

library(tidyr)
lab_dat = separate(lab_dat, group, c("sex", "day"))
lab_dat

               sex  day tip total_bill                                                            label
Female.Fri  Female  Fri   1      40.81   ~~italic(r)^2 ~ "=" ~ "0.72" * "," ~ italic(p) ~ "=" ~ "0.029"
Male.Fri      Male  Fri   1      40.81 ~~italic(r)^2 ~ "=" ~ "0.92" * "," ~ italic(p) ~ "=" ~ "0.00017"
Female.Sat  Female  Sat   1      40.81  ~~italic(r)^2 ~ "=" ~ "0.50" * "," ~ italic(p) ~ "=" ~ "0.0071"
Male.Sat      Male  Sat   1      40.81 ~~italic(r)^2 ~ "=" ~ "0.77" * "," ~ italic(p) ~ "=" ~ "1.4e-12"
Female.Sun  Female  Sun   1      40.81 ~~italic(r)^2 ~ "=" ~ "0.74" * "," ~ italic(p) ~ "=" ~ "0.00041"
Male.Sun      Male  Sun   1      40.81 ~~italic(r)^2 ~ "=" ~ "0.46" * "," ~ italic(p) ~ "=" ~ "0.00032"
Female.Thur Female Thur   1      40.81 ~~italic(r)^2 ~ "=" ~ "0.87" * "," ~ italic(p) ~ "=" ~ "9.4e-11"
Male.Thur     Male Thur   1      40.81   ~~italic(r)^2 ~ "=" ~ "0.76" * "," ~ italic(p) ~ "=" ~ "1e-06"

Now you can plot one label per facet, using lab_dat for the geom_text() layer.

ggplot(tips, aes(tip, total_bill)) +
     geom_point(size = 5, alpha = 0.9)+
     geom_smooth(method=lm, colour = "#000000", se = FALSE)+
     facet_grid(sex ~ day, scales = "free")+
     geom_text(data = lab_dat, aes(label = label), parse = TRUE, 
               vjust = "inward", hjust = "inward")
aosmith
  • 34,856
  • 9
  • 84
  • 118