0

This is my first time posting, so I apologize for any weird formatting issues, especially with my R code.

First, I am trying to get 95% confidence interval curves around the two curves in each panel of the figure reproduced with the code below. It seems like when I have ggplot(look, aes(x=treatment, y=fit, color=generation)) and facet_wrap(~population), I can't get the 95% confidence intervals to work with geom_smooth (not included in the code below because it doesn't do anything). geom_smooth only works when I have ggplot(look, aes(x=treatment, y=fit, color=population)) and facet_wrap(~population).

Please let me know if anything needs clarification!

Here's the visual output I get. I am trying to get geom_smooth to add confidence intervals to each curve.

ggplot output

These are the types of intervals I'm trying to add:

geom_smooth example

library(ggplot2)

germData <- read.csv("https://dl.dropbox.com/s/h75rzzavpoxb8q1/processed_germData%20-%20no%20GA%20-%20continuous.csv")

totalViable    <- germData$totalGoodSeeds   
totalGerm      <- germData$totalGerm        

y <- cbind(totalGerm, totalViable-totalGerm)

germData$germRate <- germData$totalGerm/germData$totalGoodSeeds

q1  <- glm(y ~ population + generation + treatment + population:treatment + population:generation + generation:treatment + population:generation:treatment, germData, family=quasibinomial)

xv <- seq(0, 20, .2)
pfl <- length(xv)

look <- data.frame(population = factor(rep(levels(germData$population), each = 2*pfl)), generation = factor(rep(levels(germData$generation), each = pfl)), treatment = xv)

pred <- predict(q1, newdata = look, type= "response", se.fit = TRUE)
look$fit <- pred$fit
look$se.fit <- pred$se.fit

GAintercepts <- read.csv("https://dl.dropbox.com/s/jeei9yst7h4l37q/predict_germData.csv")

GAinterceptpoints <- GAintercepts[GAintercepts$treatment=="GA3",]
GAinterceptpoints$intercept <- "0" 
GAinterceptpoints$intercept <- as.numeric(GAinterceptpoints$intercept)

bigPlot <- ggplot(look, aes(x=treatment, y=fit, color=generation)) + 
    geom_line() + 
    geom_jitter(width=0.02, inherit.aes = FALSE, data=germData, 
    aes(x=treatment, y=germRate, color=generation), show.legend = FALSE) 

bigPlotPlus <- bigPlot + 
    geom_smooth() + 
    scale_color_manual(name="generation", values=c("#0f71bc", "#a3e1ff")) + 
    geom_pointrange(inherit.aes=FALSE, data=GAinterceptpoints, 
    aes(x=intercept, y=fit.germRate, ymin=fit.germRate-se, 
    ymax=fit.germRate+se, color=generation)) + 
    scale_color_manual(name="generation", values=c("#0f71bc", "#a3e1ff")) + 
    facet_wrap(~population) 

bigPlotPlus
MLavoie
  • 9,671
  • 41
  • 36
  • 56
Katie K
  • 1
  • 2
  • 1
    If you don't get an answer soon, consider editing the question by stripping out a lot of the elements that aren't critical to your question (like the theme and coordinates). It will help readers, but it may also help you see the problem first. This is also a good [resource](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example?rq=1). – wibeasley Mar 17 '18 at 02:57

1 Answers1

2

Without seeing a sketch of your desired output, I may be misunderstanding what you're going for. If these don't match it, then please sketch what you're going for. I think the problem is that you need to explicitly define the group parameter to be a unique combination of population and generation.

library(magrittr)
library(ggplot2)

germDataOriginal <- read.csv("https://dl.dropbox.com/s/h75rzzavpoxb8q1/processed_germData%20-%20no%20GA%20-%20continuous.csv")

germData <- germDataOriginal %>% 
  dplyr::select(
    population, 
    generation, 
    treatment,
    totalGoodSeeds,  
    totalGerm,       
    totalColdGerm   
  ) %>% 
  dplyr::mutate(
    difference  = totalViable - totalGerm,
    germRate = totalGerm / totalGoodSeeds
  )

q1 <- glm(
  cbind(germData$totalGerm, germData$difference) 
  ~ 
  population + generation + treatment + population*generation*treatment, germData, family=quasibinomial
)

look <- tidyr::crossing(
  population  = germData$population, 
  generation  = germData$generation,
  treatment   = seq(0, 20, by=5)
) %>% 
dplyr::mutate(
  pop_gen    = paste0(population, "-", generation)
)

pred <- predict(q1, newdata = look, type= "response", se.fit = TRUE)
look$fit <- pred$fit
look$se.fit <- pred$se.fit

GAinterceptpoints <- read.csv("https://dl.dropbox.com/s/jeei9yst7h4l37q/predict_germData.csv") %>% 
  dplyr::filter(treatment=="GA3") %>% 
  dplyr::mutate(
    intercept = 0
  )

bigPlot <- ggplot(look, aes(x=treatment, y=fit, color=generation)) + 
  geom_line(aes(group=pop_gen)) +
  geom_point(aes(x=treatment, y=germRate, colour=generation), data=germData, inherit.aes = FALSE, shape=1)
bigPlot

bigPlot + 
  geom_pointrange(
    aes(x=intercept,y=fit.germRate, ymin=fit.germRate-se, ymax=fit.germRate+se, color=generation),
    inherit.aes=FALSE, data=GAinterceptpoints
  ) 

enter image description here

last_plot() + 
  facet_wrap(~population) 

enter image description here

For reference, your original bigPlot looks like

enter image description here

Notice I removed more cosmetics. I liked what you had better, but they're likely distracting from your fundamental question. Also, I'd separate that question about color scales, because it appears mostly unrelated. Also, in the future, simplify the incoming datasets to just the minimal amount of rows and columns. In order to address your fundamental question, there's no real need to calculate the germRate in the question's code; it's distracting.

wibeasley
  • 5,000
  • 3
  • 34
  • 62
  • Thank you for all your help and suggestions- I'm definitely still learning. I updated my post to include images of my output and my desired output, but you did solve one of the problems I was having. Explicitly defining a "pop_gen" group allowed me to use geom_smooth to add the confidence interval curves. The confidence interval curves aren't the way I'd ideally want them to look (too linear compared to the curves themselves) but I'm going to try to figure it out. – Katie K Mar 17 '18 at 15:03