1

Columns and first rows of code I have several different geom_smooth(method="glm") lines in the same geom_point graph in ggplot2. I'm looking to determine the regression equation for each line, including the slope equation. I found a similar post but I'm still having some problems. My code is:

native <- read.csv("native.gather.C4C5C6C7.csv")

ggplot(native, aes(x=YearsPostRelease, y=PercentNative, col=FieldType, linetype=FieldType)) + 
    geom_point(size=0.7) + 
    geom_smooth(data = native, 
                method ="glm", alpha = 0, show.legend = FALSE, linetype = 'solid') +
    scale_x_continuous(breaks = c(0, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55)) +
    scale_y_continuous(limits = c(0, 100), 
                       breaks = c(0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100)) + 
    ggtitle("Percent Native Through Time")

Thanks in advance!

btp7rr
  • 11
  • 3
  • 1
    Please include the link to the similar post, it helps for reference. It'd also be helpful to see what your data look like, please include a sample. – OTStats Jan 08 '19 at 18:12
  • Here is the link: https://stackoverflow.com/questions/7549694/adding-regression-line-equation-and-r2-on-graph I also added an image of the data. Sorry first time poster on this site. Thanks – btp7rr Jan 08 '19 at 18:15
  • What lines do you want to add exactly? You can add as many `geom_smooth()` layers as you like. It's not clear to me what exactly your question is. – MrFlick Jan 08 '19 at 18:17
  • I have the lines already added with that code, I'm looking to determine what the slope of each line is for further statistical analysis. I can't use the r2 values as the sample sizes between each line is very different – btp7rr Jan 08 '19 at 18:19
  • So in other words, you'd like the regression equation details? – OTStats Jan 08 '19 at 18:25
  • Yes exactly - I'm looking for the regression equations details of each of the lines in the graph. – btp7rr Jan 08 '19 at 18:26
  • @btp7rr Your question seems like a duplicate of [Adding Regression Line Equation and R2 on graph](https://stackoverflow.com/questions/7549694/adding-regression-line-equation-and-r2-on-graph). Do the answers provided in that post help? – Maurits Evers Jan 08 '19 at 19:20
  • It is very similar, but no solution included a sample with multiple regression equations on a single plot (which I assume this is also being asked here). – OTStats Jan 08 '19 at 19:28
  • @MauritsEvers that is the post I tagged in my question but had no luck with those methods. As OTStats mentioned, my specific graph has multiple regression lines on it. – btp7rr Jan 08 '19 at 20:32
  • @btp7rr aah my bad. Should’ve read your post more carefully. – Maurits Evers Jan 08 '19 at 22:40

2 Answers2

4

Here's an approach using lm_eqn as defined here. You probably experienced issues because your data don't match the expected input of the function. I used mtcars here since I don't have your data, exploring the relationship between mpg and wt between cyl groups. Below, note the customization of the relationship I am investigating.

lm_eqn <- function(df){
  m <- lm(mpg ~ wt, df);
  eq <- substitute(italic(mpg) == a + b %.% italic(wt)*","~~italic(r)^2~"="~r2, 
                   list(a = format(coef(m)[1], digits = 2), 
                        b = format(coef(m)[2], digits = 2), 
                        r2 = format(summary(m)$r.squared, digits = 3)))
  as.character(as.expression(eq));                 
}

We can apply that to manually defined subsets of the data. There's probably a smarter way to apply this to multiple groups more automatically, but since its hard to automate smart label locations, this might be good enough.

library(ggplot2); library(dplyr)
ggplot(mtcars, aes(x=wt, y=mpg, 
                   col=as.factor(cyl), linetype=as.factor(cyl))) + 
  geom_point() + 
  geom_smooth(data = mtcars, 
              method ="glm", alpha = 0, show.legend = FALSE, linetype = 'solid') +
  annotate("text", x = 3, y = 30, label = lm_eqn(mtcars %>% filter(cyl == 4)), parse = TRUE) +
  annotate("text", x = 4.3, y = 20, label = lm_eqn(mtcars %>% filter(cyl == 6)), parse = TRUE) +
  annotate("text", x = 4, y = 12, label = lm_eqn(mtcars %>% filter(cyl == 8)), parse = TRUE)

enter image description here

OTStats
  • 1,820
  • 1
  • 13
  • 22
Jon Spring
  • 55,165
  • 4
  • 35
  • 53
0

Applying what Jon contributed above, you can customize this function to your data as follows.

Again it's difficult to know completely what your underlying data look like, but let's assume that your field, FieldType, contains three factors: BSSFields, CSSFields, DSSFields.

# Load data
library(tidyverse)
native <- read.csv("native.gather.C4C5C6C7.csv")

# Define function
lm_eqn <- function(df){
  m <- lm(PercentNative ~ YearsPostRelease, df);
  eq <- substitute(italic(native) == a + b %.% 
italic(YearsPostRelease)*","~~italic(r)^2~"="~r2, 
                   list(a = format(coef(m)[1], digits = 2), 
                        b = format(coef(m)[2], digits = 2), 
                        r2 = format(summary(m)$r.squared, digits = 3)))
  as.character(as.expression(eq));                 
}

# Plot data
ggplot(native, aes(x = YearsPostRelease, 
                   y = PercentNative, 
                   col = FieldType, 
                   linetype = FieldType)) +
  geom_point(size=0.7) + 
  geom_smooth(data = native, 
              method ="glm", alpha = 0, show.legend = FALSE, linetype = 'solid') +
  scale_x_continuous(breaks = c(0, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55)) +
  scale_y_continuous(limits = c(0, 100), 
                     breaks = c(0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100)) + 
  annotate("text", x = 3, y = 30, 
           label = lm_eqn(native %>% filter(FieldType == "BSSFields")), parse = TRUE) +
  annotate("text", x = 4, y = 20, 
           label = lm_eqn(native %>% filter(FieldType == "CSSFields")), parse = TRUE) +
  annotate("text", x = 5, y = 10, 
           label = lm_eqn(native %>% filter(FieldType == "DSSFields")), parse = TRUE)
  ggtitle("Percent Native Through Time")

It's important to note that the location of these regressions equations will have be modified based on the range of YearsPostRelease and PercentNative. Also, if FieldTypes contain more than three levels, you'll have to add corresponding annotate() calls, customized to the level name.

OTStats
  • 1,820
  • 1
  • 13
  • 22
  • I've changed the annotation to include all five levels of my Fieldtype. I run the function and it successfully adds a function to my global environment. I then attempt to run the updated ggplot line of code and no graph is created, I only see the following in the console: Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) : 0 (non-NA) cases Any ideas? – btp7rr Jan 09 '19 at 16:37
  • Do you have NA values in your data? – OTStats Jan 09 '19 at 17:52
  • I just double checked for NA using is.na and can confirm I have no NA value in my data. Is there a way in which I can upload my data on the site to determine the source of my error? – btp7rr Jan 09 '19 at 19:56