8

This question showed how to make a qqplot with a qqline in ggplot2, but the answer only seems to work when plotting the entire dataset in a single graph.

I want a way to quickly compare these plots for subsets of my data. That is, I want to make qqplots with qqlines on a graph with facets. So in the following example, there would be lines for all 9 plots, each with their own intercept and slope.

df1 = data.frame(x = rnorm(1000, 10),
                 y = sample(LETTERS[1:3], 100, replace = TRUE),
                 z = sample(letters[1:3], 100, replace = TRUE))

ggplot(df1, aes(sample = x)) +
  stat_qq() +
  facet_grid(y ~ z)

facet data

Community
  • 1
  • 1
Nick
  • 1,018
  • 7
  • 13

2 Answers2

8

You may try this:

library(plyr)

# create some data
set.seed(123)
df1 <- data.frame(vals = rnorm(1000, 10),
                  y = sample(LETTERS[1:3], 1000, replace = TRUE),
                  z = sample(letters[1:3], 1000, replace = TRUE))

# calculate the normal theoretical quantiles per group
df2 <- ddply(.data = df1, .variables = .(y, z), function(dat){
             q <- qqnorm(dat$vals, plot = FALSE)
             dat$xq <- q$x
             dat
}
)

# plot the sample values against the theoretical quantiles
ggplot(data = df2, aes(x = xq, y = vals)) +
  geom_point() +
  geom_smooth(method = "lm", se = FALSE) +
  xlab("Theoretical") +
  ylab("Sample") +
  facet_grid(y ~ z)

enter image description here

Henrik
  • 65,555
  • 14
  • 143
  • 159
4

For no good reason, here's the dplyr (which didn't exist at the time of this question) version of the same thing. In the interest of peer review and comparison, I'll provide code that generates the data sets so that you can inspect them further.

# create some data
set.seed(123)
df1 <- data.frame(vals = rnorm(10, 10),
                  y = sample(LETTERS[1:3], 1000, replace = TRUE),
                  z = sample(letters[1:3], 1000, replace = TRUE))

#* Henrik's plyr version
library(plyr)
df2 <- plyr::ddply(.data = df1, .variables = .(y, z), function(dat){
             q <- qqnorm(dat$vals, plot = FALSE)
             dat$xq <- q$x
             dat
}
)

detach("package:plyr")


#* The dplyr version
library(dplyr)
qqnorm_data <- function(x){
  Q <- as.data.frame(qqnorm(x, plot = FALSE))
  names(Q) <- c("xq", substitute(x))
  Q
}

df3 <- df1 %>%
  group_by(y, z) %>%
      do(with(., qqnorm_data(vals)))

The plotting can be done with the same code from Henrik.

Benjamin
  • 16,897
  • 6
  • 45
  • 65