1

I need to do something similar to what's shown in this excellent question:

Q-Q plot with ggplot2::stat_qq, colours, single group

but unfortunately there's a slight difference which is blocking me. Unlike the original question, I do want to separate the quantile computations by group, but I also want to add a QQ-line for each group. Following the OP's code, I can create the quantile-quantile plots by group:

library(dplyr)
library(ggplot2)
library(broom) ## for augment()

set.seed(1001)
N <- 1000
G <- 10
dd <- data_frame(x = runif(N),
                 group = factor(sample(LETTERS[1:G], size=N, replace=TRUE)),
                 y = rnorm(N) + 2*x + as.numeric(group))
m1 <- lm(y~x, data=dd)
dda <- cbind(augment(m1), group=dd$group)
sample_var <- "y"
group_var  <- "group"
p <- ggplot(dda)+stat_qq(aes_string(sample=sample_var, colour=group_var))
p

enter image description here

How can I add the quantile-quantile lines for each group? NOTE: ideally I would like to specify the sample column and the group column at runtime. That's why I used aes_string.

EDIT to better clarify my problem, I add code to compute quantile-quantile lines when there's only one group. I need to generalize the code to multiple groups.

library(dplyr)
library(ggplot2)
library(broom) ## for augment()

# this section of the code is the same as before, EXCEPT G = 1, because for 
# now the code only works for 1 group
set.seed(1001)
N <- 1000
G <- 1
dd <- data_frame(x = runif(N),
                 group = factor(sample(LETTERS[1:G], size=N, replace=TRUE)),
                 y = rnorm(N) + 2*x + as.numeric(group))
m1 <- lm(y~x, data=dd)
dda <- cbind(augment(m1), group=dd$group)
sample_var <- "y"
group_var  <- "group"

# code to compute the slope and the intercept of the qq-line: basically,
# I would need to compute the slopes and the intercepts of the qq-lines
# for each group
vec <- dda[, sample_var]
y <- quantile(vec[!is.na(vec)], c(0.25, 0.75))
x <- qnorm(c(0.25, 0.75))
slope <- diff(y)/diff(x)
int <- y[1] - slope * x[1]

# now plot with ggplot2
p <- ggplot(dda)+stat_qq(aes_string(sample=sample_var, colour=group_var))+geom_abline(slope = slope, intercept = int)
p

enter image description here

Community
  • 1
  • 1
DeltaIV
  • 4,773
  • 12
  • 39
  • 86
  • @BenBolker since you had a similar problem, maybe you can offer some hints? :) – DeltaIV May 02 '17 at 15:57
  • 1
    do you mean `p+stat_qq(aes_string(sample=sample_var, colour=group_var), geom = "line") ` ? – mtoto May 02 '17 at 16:09
  • @mtoto no! That just links the plots in each group with a line. I want to plot quantile-quantile lines. I'm adding code which works for one group, so you can have an idea what I need to do for more that one group. – DeltaIV May 02 '17 at 16:34

2 Answers2

3

Turning the code to calculate the qqlines into a function and then using lapply to create a separate data.frame for your qqlines is one approach.

library(dplyr)
library(ggplot2)
library(broom) ## for augment()

set.seed(1001)
N <- 1000
G <- 3
dd <- data_frame(x = runif(N),
                 group = factor(sample(LETTERS[1:G], size=N, replace=TRUE)),
                 y = rnorm(N) + 2*x + as.numeric(group))
m1 <- lm(y~x, data=dd)
dda <- cbind(augment(m1), group=dd$group)
sample_var <- "y"
group_var  <- "group"

# code to compute the slope and the intercept of the qq-line 

qqlines <- function(vec, group) {
    x <- qnorm(c(0.25, 0.75))    
    y <- quantile(vec[!is.na(vec)], c(0.25, 0.75))
    slope <- diff(y)/diff(x)
    int <- y[1] - slope * x[1]
    data.frame(slope, int, group)
}


slopedf <- do.call(rbind,lapply(unique(dda$group), function(grp) qqlines(dda[dda$group == grp,sample_var], grp)))



# now plot with ggplot2
p <- ggplot(dda)+stat_qq(aes_string(sample=sample_var, colour=group_var)) + 
    geom_abline(data = slopedf, aes(slope = slope, intercept = int, colour = group))
p

enter image description here

Jeremy Voisey
  • 1,257
  • 9
  • 13
  • Nice! I would have used a couple of support vectors `used_groups <- unique(dda$group)` and `samples <- dda[dda$group == grp,sample_var]` to make the code a little more readable, but really fine anyway. It will be easy to turn this in a reusable function. +1! – DeltaIV May 03 '17 at 08:04
  • 1
    Glad it's useful. You're right, my code does need a bit of work. But in my excuse, it was past my bedtime, when I finished it :-) – Jeremy Voisey May 03 '17 at 08:09
1

A more concise alternative. In ggplot2 v.3.0.0 and above you can use stat_qq_line:

ggplot(dda, aes(sample = y, colour = group)) +
  stat_qq() +
  stat_qq_line()

Output:

enter image description here

Data, from Jeremy Voisey's answer:

library(ggplot2)
library(broom)
set.seed(1001)
N <- 1000
G <- 3
dd <- data_frame(
  x = runif(N),
  group = factor(sample(LETTERS[1:G], size = N, replace = TRUE)),
  y = rnorm(N) + 2 * x + as.numeric(group)
)
m1 <- lm(y ~ x, data = dd)
dda <- cbind(augment(m1), group = dd$group)
mpalanco
  • 12,960
  • 2
  • 59
  • 67