61

Say have a linear model LM that I want a qq plot of the residuals. Normally I would use the R base graphics:

qqnorm(residuals(LM), ylab="Residuals")
qqline(residuals(LM))

I can figure out how to get the qqnorm part of the plot, but I can't seem to manage the qqline:

ggplot(LM, aes(sample=.resid)) +
    stat_qq()

I suspect I'm missing something pretty basic, but it seems like there ought to be an easy way of doing this.

EDIT: Many thanks for the solution below. I've modified the code (very slightly) to extract the information from the linear model so that the plot works like the convenience plot in the R base graphics package.

ggQQ <- function(LM) # argument: a linear model
{
    y <- quantile(LM$resid[!is.na(LM$resid)], c(0.25, 0.75))
    x <- qnorm(c(0.25, 0.75))
    slope <- diff(y)/diff(x)
    int <- y[1L] - slope * x[1L]
    p <- ggplot(LM, aes(sample=.resid)) +
        stat_qq(alpha = 0.5) +
        geom_abline(slope = slope, intercept = int, color="blue")

    return(p)
}
Mike Wise
  • 22,131
  • 8
  • 81
  • 104
Peter
  • 4,219
  • 4
  • 28
  • 40
  • Note: for residuals, ggplot needs **standardized** residuals. See jlhoward's answer https://stackoverflow.com/a/19990107/3163618 – qwr Oct 25 '18 at 19:47

8 Answers8

54

The following code will give you the plot you want. The ggplot package doesn't seem to contain code for calculating the parameters of the qqline, so I don't know if it's possible to achieve such a plot in a (comprehensible) one-liner.

qqplot.data <- function (vec) # argument: vector of numbers
{
  # following four lines from base R's qqline()
  y <- quantile(vec[!is.na(vec)], c(0.25, 0.75))
  x <- qnorm(c(0.25, 0.75))
  slope <- diff(y)/diff(x)
  int <- y[1L] - slope * x[1L]

  d <- data.frame(resids = vec)

  ggplot(d, aes(sample = resids)) + stat_qq() + geom_abline(slope = slope, intercept = int)

}
Aaron
  • 816
  • 7
  • 4
  • Works perfectly! I took the liberty of slightly modifying the code to extract the vector directly from a linear model. Of course your solution will work with data that isn't in the form of a linear model, but I thought someone else might want a convenience function for building a qqplot from a LM. – Peter Dec 05 '10 at 14:59
23

You can also add confidence Intervals/confidence bands with this function (Parts of the code copied from car:::qqPlot)

gg_qq <- function(x, distribution = "norm", ..., line.estimate = NULL, conf = 0.95,
                  labels = names(x)){
  q.function <- eval(parse(text = paste0("q", distribution)))
  d.function <- eval(parse(text = paste0("d", distribution)))
  x <- na.omit(x)
  ord <- order(x)
  n <- length(x)
  P <- ppoints(length(x))
  df <- data.frame(ord.x = x[ord], z = q.function(P, ...))

  if(is.null(line.estimate)){
    Q.x <- quantile(df$ord.x, c(0.25, 0.75))
    Q.z <- q.function(c(0.25, 0.75), ...)
    b <- diff(Q.x)/diff(Q.z)
    coef <- c(Q.x[1] - b * Q.z[1], b)
  } else {
    coef <- coef(line.estimate(ord.x ~ z))
  }

  zz <- qnorm(1 - (1 - conf)/2)
  SE <- (coef[2]/d.function(df$z)) * sqrt(P * (1 - P)/n)
  fit.value <- coef[1] + coef[2] * df$z
  df$upper <- fit.value + zz * SE
  df$lower <- fit.value - zz * SE

  if(!is.null(labels)){ 
    df$label <- ifelse(df$ord.x > df$upper | df$ord.x < df$lower, labels[ord],"")
    }

  p <- ggplot(df, aes(x=z, y=ord.x)) +
    geom_point() + 
    geom_abline(intercept = coef[1], slope = coef[2]) +
    geom_ribbon(aes(ymin = lower, ymax = upper), alpha=0.2) 
  if(!is.null(labels)) p <- p + geom_text( aes(label = label))
  print(p)
  coef
}

Example:

Animals2 <- data(Animals2, package = "robustbase")
mod.lm <- lm(log(Animals2$brain) ~ log(Animals2$body))
x <- rstudent(mod.lm)
gg_qq(x)

enter image description here

Rentrop
  • 20,979
  • 10
  • 72
  • 100
  • 1
    This is super helpful. Have you thought about hosting your script on Github? It would be nice to properly site your code, – Hip Hop Physician Sep 26 '15 at 05:21
  • 1
    https://gist.github.com/rentrop/d39a8406ad8af2a1066c like this? Even tho i dont know why you can't site SO... – Rentrop Sep 26 '15 at 06:29
  • 3
    Thanks a lot! I suppose I misspoke slightly, what I meant it would be nice to have it posted on Github so can I bring it in as part of an R script (instead of finding a way to splice your Stack Overflow post.) – Hip Hop Physician Sep 28 '15 at 15:00
16

Since version 3.0, a stat_qq_line equivalent to the below has found its way into the official ggplot2 code.


Old version:

Since version 2.0, ggplot2 has a well-documented interface for extension; so we can now easily write a new stat for the qqline by itself (which I've done for the first time, so improvements are welcome):

qq.line <- function(data, qf, na.rm) {
    # from stackoverflow.com/a/4357932/1346276
    q.sample <- quantile(data, c(0.25, 0.75), na.rm = na.rm)
    q.theory <- qf(c(0.25, 0.75))
    slope <- diff(q.sample) / diff(q.theory)
    intercept <- q.sample[1] - slope * q.theory[1]

    list(slope = slope, intercept = intercept)
}

StatQQLine <- ggproto("StatQQLine", Stat,
    # http://docs.ggplot2.org/current/vignettes/extending-ggplot2.html
    # https://github.com/hadley/ggplot2/blob/master/R/stat-qq.r
    
    required_aes = c('sample'),
    
    compute_group = function(data, scales,
                             distribution = stats::qnorm,
                             dparams = list(),
                             na.rm = FALSE) {
        qf <- function(p) do.call(distribution, c(list(p = p), dparams))
        
        n <- length(data$sample)
        theoretical <- qf(stats::ppoints(n))
        qq <- qq.line(data$sample, qf = qf, na.rm = na.rm)
        line <- qq$intercept + theoretical * qq$slope

        data.frame(x = theoretical, y = line)
    } 
)

stat_qqline <- function(mapping = NULL, data = NULL, geom = "line",
                        position = "identity", ...,
                        distribution = stats::qnorm,
                        dparams = list(),
                        na.rm = FALSE,
                        show.legend = NA, 
                        inherit.aes = TRUE) {
    layer(stat = StatQQLine, data = data, mapping = mapping, geom = geom,
          position = position, show.legend = show.legend, inherit.aes = inherit.aes,
          params = list(distribution = distribution,
                        dparams = dparams,
                        na.rm = na.rm, ...))
}

This also generalizes over the distribution (exactly like stat_qq does), and can be used as follows:

> test.data <- data.frame(sample=rnorm(100, 10, 2)) # normal distribution
> test.data.2 <- data.frame(sample=rt(100, df=2))   # t distribution
> ggplot(test.data, aes(sample=sample)) + stat_qq() + stat_qqline()
> ggplot(test.data.2, aes(sample=sample)) + stat_qq(distribution=qt, dparams=list(df=2)) +
+   stat_qqline(distribution=qt, dparams=list(df=2))

(Unfortunately, since the qqline is on a separate layer, I couldn't find a way to "reuse" the distribution parameters, but that should only be a minor problem.)

phipsgabler
  • 20,535
  • 4
  • 40
  • 60
14

The standard Q-Q diagnostic for linear models plots quantiles of the standardized residuals vs. theoretical quantiles of N(0,1). @Peter's ggQQ function plots the residuals. The snippet below amends that and adds a few cosmetic changes to make the plot more like what one would get from plot(lm(...)).

ggQQ = function(lm) {
  # extract standardized residuals from the fit
  d <- data.frame(std.resid = rstandard(lm))
  # calculate 1Q/4Q line
  y <- quantile(d$std.resid[!is.na(d$std.resid)], c(0.25, 0.75))
  x <- qnorm(c(0.25, 0.75))
  slope <- diff(y)/diff(x)
  int <- y[1L] - slope * x[1L]

  p <- ggplot(data=d, aes(sample=std.resid)) +
    stat_qq(shape=1, size=3) +           # open circles
    labs(title="Normal Q-Q",             # plot title
         x="Theoretical Quantiles",      # x-axis label
         y="Standardized Residuals") +   # y-axis label
    geom_abline(slope = slope, intercept = int, linetype="dashed")  # dashed reference line
  return(p)
}

Example of use:

# sample data (y = x + N(0,1), x in [1,100])
df <- data.frame(cbind(x=c(1:100),y=c(1:100+rnorm(100))))
ggQQ(lm(y~x,data=df))
jlhoward
  • 58,004
  • 7
  • 97
  • 140
11

With the latest ggplot2 version (>=3.0), new function stat_qq_line is implemented (https://github.com/tidyverse/ggplot2/blob/master/NEWS.md) and a qq line for model residuals can be added with:

library(ggplot2)
model <- lm(mpg ~ wt, data=mtcars)
ggplot(model, aes(sample = rstandard(model))) + geom_qq() + stat_qq_line()

rstandard(model) is needed to get the standardized residual. (credit @jlhoward and @qwr)

If you get an 'Error in stat_qq_line() : could not find function "stat_qq_line"', your ggplot2 version is too old and you can fix it by upgrading your ggplot2 package: install.packages("ggplot2") .

LmW.
  • 1,364
  • 9
  • 16
  • This is now released in stable version ggplot2 3.0.0. So you can now get this feature from the CRAN version. – Droplet Jul 27 '18 at 19:02
  • Ggplot expects **standardized** residuals as jlhoward describes. So use `rstandard(model)` instead of `.resid` – qwr Oct 12 '18 at 06:39
9

Why not the following?

Given some vector, say,

myresiduals <- rnorm(100) ^ 2

ggplot(data=as.data.frame(qqnorm( myresiduals , plot=F)), mapping=aes(x=x, y=y)) + 
    geom_point() + geom_smooth(method="lm", se=FALSE)

But it seems strange that we have to use a traditional graphics function to prop up ggplot2.

Can't we get the same effect somehow by starting with the vector for which we want the quantile plot and then applying the appropriate "stat" and "geom" functions in ggplot2?

Does Hadley Wickham monitor these posts? Maybe he can show us a better way.

joran
  • 169,992
  • 32
  • 429
  • 468
Jacob Wegelin
  • 1,304
  • 11
  • 16
  • 1
    the scatter plot resembles the q-q plot of the qqnorm() but the line added by geom_smooth is not same as the one given by qqline(). The solutions given by Aaron and @jlhoward, on the other hand , give plots similar to the base R ones. Can you comment if it is my data, because of which it is misbehaving. – ktyagi Apr 16 '14 at 15:14
4

You could steal a page from the old-timers who did this stuff with normal probability paper. A careful look at a ggplot()+stat_qq() graphic suggests that a reference line can be added with geom_abline(), like this

df <- data.frame( y=rpois(100, 4) )

ggplot(df, aes(sample=y)) +
  stat_qq() +
  geom_abline(intercept=mean(df$y), slope = sd(df$y))
  • Shouldn't a Q-Q plot with sample vs theoretical quantiles have reference line y=x? – qwr Feb 05 '19 at 21:23
1

ggplot2 v.3.0.0 now has an qqline stat. From the help page:

df <- data.frame(y = rt(200, df = 5))
p <- ggplot(df, aes(sample = y))
p + stat_qq() + stat_qq_line()

!ggplot2 v3.0.0 Example stats equivalent to qqnorm plus abline]1

Richard Careaga
  • 628
  • 1
  • 5
  • 11