9

Is there a way to add confidence intervals to a qqplot?

I have a dataset of gene expression values, which I've visualized using PCA:

pca1 = prcomp(data, scale. = TRUE)

I'm now looking for outliers by checking the distribution of the data against the normal distribution through:

qqnorm(pca1$x,pch = 20, col = c(rep("red", 73), rep("blue", 33)))

qqline(pca1$x)

This is my data:

data = [2.48 104 4.25 219 0.682 0.302 1.09 0.586 90.7 344 13.8 1.17 305 2.8 79.7 3.18 109 0.932 562 0.958 1.87 0.59 114 391 13.5 1.41 208 2.37 166 3.42]

I would now like to plot 95% confidence intervals to check which data points lie outside. Any tips on how to do this?

user2846211
  • 949
  • 6
  • 16
  • 24
  • So you want to subtract your sample distribution from the theoretical normal distribution? Sounds like what you want to do is use `nls` to fit your data to a normal dist function and grab the confidence data from the output of `nls` . – Carl Witthoft Oct 11 '13 at 11:28
  • You are much more likely to receive a helpful answer if you provide a [minimal, reproducible data set](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610#5963610) together with the code you have tried. Thanks! – Henrik Oct 11 '13 at 11:33
  • I've edited my initial post with some data. Is it possible to grab the confidence data from the output of qqnorm? – user2846211 Oct 11 '13 at 11:48
  • Please read the link I posted and format your sample data accordingly. Thanks. – Henrik Oct 11 '13 at 12:10
  • Try cutting and pasting your data into the R command line. Doesn't work. Use `dput` and assign to `data <-` – Tyler Rinker Oct 11 '13 at 12:32
  • I don't think you understand what you want to ask. The data points themselves exceed 95% Confidence Interval if they lie outside some range `-k*sigma` to `+k*sigma` . The QQ plot only shows how the overall distribution differs from a selected normal distribution (with some specified mean and sigma). The only thing you could overplot is some sort of "confidence interval' that your data are in fact normally distributed, not which data points are outliers. – Carl Witthoft Oct 11 '13 at 14:09

1 Answers1

15

The library car provides the function qqPlot(...) which adds a pointwise confidence envelope to the normal qq-plot by default:

library(car)
qqPlot(pca1$x)
sieste
  • 8,296
  • 3
  • 33
  • 48