2

I am working with a vector of residuals trying to verify whether a Normal distribution is an appropriate assumption for the error terms.

stres<-c(-1.8901289914,0.4204280426,-0.0145373478,-0.9589928480,0.2979275041,-0.0739727698,-0.2855008329, 0.7230079969, -0.2914220542, -1.4806560234,  0.1745061707, -1.0128947866, 0.2164536856, -1.1546273403, -0.1542422829,  0.4053103868, -0.5823112019,  0.4563220212, -0.2041307378, -0.6045740758, -0.3423926064,  0.5056780975, -0.3331478887, -0.7572435490,-0.2779393059, -0.6849294132,  0.1311372820, -0.3030318977,  1.2808663783,  0.1563968894,0.6010948547, -0.3774192022,  1.0438147373,  1.6050759999, -0.9890854956, -0.1287947910, 0.7271980085, -0.9227865867,  0.4821372580,  0.3399080091, -0.2351548189,  0.6355239393, 2.1741928350, -0.2974405677, -1.0528047470, -0.2669435284,  5.0818621788, -0.0088001872, 0.3437940065, -0.4497556191,  0.3016467357, -0.5614212196, -0.8108463584,  0.8338203257, -0.0004854171,  0.8685702440, -1.4909115265, -1.0530867343,  0.9150175493,  1.0730349160, 0.8118376993,  0.1817804928, -1.9965218063,  0.2532144991, -1.1466778177, -0.0578508361, 0.3446155055, -0.5806560664, -0.1169654345, -0.5275995111, -1.0649636054, -0.3887896268, -0.4160932861, -0.0934359868,  0.3515817484, -1.5149224114,  0.5314789991, -0.2125920264, -0.7261382754, -0.2765816110,  1.5836098693, -0.5722100502, -1.7568426345, -0.5378891714, -0.1135252088, -0.2024199659, -0.8113893763,  0.1124588319, -0.4545775998,  0.6401511326, 0.4593905713,  0.6382493988,  1.3572090694, -1.2216154767,  0.0863635754, -0.1183901577, -0.4117695427,  1.3132905003, -0.1426298933, -0.6680755381,  1.1076746801,  0.6213041292, -0.8735205521,  0.9021532905, -0.1517989978,  1.0997109361,  0.0038767275, 0.0967747416, 0.5100796248, -0.2174347692, -1.7604201415,  4.2759584670,  0.3281834312)
qplot(sample=stres)+labs(title="QQ Plot/Studentized Residuals")+theme_bw()

enter image description here

As you can see, most residuals fall on or close to the line. The exception is those two at the upper right corner. I would like to identify them so as to be able to study those particular observations a little better. Could you please tell me how I could do this?

Thank you.

Community
  • 1
  • 1
JohnK
  • 1,019
  • 3
  • 14
  • 30
  • Using `library(car)`'s `qqPlot` should be easier. Or if you are wedded to `ggplot2` you may find this useful http://stackoverflow.com/questions/10526005/is-there-any-way-to-use-the-identify-command-with-ggplot-2 – KFB Dec 27 '14 at 23:32
  • http://stackoverflow.com/questions/4357031/qqnorm-and-qqline-in-ggplot2/27191036#27191036 Have a look here. I add numbers to those observations that are ourside the confidence interval – Rentrop Dec 27 '14 at 23:33
  • @RStudent This produces the values. Any way I can identify the observations on the qqplot? – JohnK Dec 27 '14 at 23:37
  • did you have a look at my soution... whats wrong with it? – Rentrop Dec 27 '14 at 23:40

1 Answers1

4
> order(stres, decreasing = TRUE)[1:2]
[1]  47 112

> stres[order(stres, decreasing = TRUE)[1:2]]
[1] 5.081862 4.275958

If you want to access the values qplot uses you can do the following:

plt <- print(qplot(sample=stres)+labs(title="QQ Plot/Studentized Residuals")+theme_bw())
plt[["data"]][[1]]

And you get the same result:

> sort(plt[["data"]][[1]]$sample, decreasing = TRUE)[1:2]
[1] 5.081862 4.275958
DatamineR
  • 10,428
  • 3
  • 25
  • 45