0

I am taking an introductory course in linear regression in college this semester. For one of my assignments, I am required to analyse a dataset using R.

Allow me to first share part of my code:

log_Metab <- log(Metab)
mammal.lm.1 <- lm(Life ~ log_Metab)
plot(mammal.lm.1, which = 2)

Basically, my dataset contains information about the metabolism rate (Metab) and lifespan (Life) of 95 different mammals and I need to check whether there is a linear relationship between the two characteristics.

Now, the third line of the code that I pasted generates the normal Q-Q plot of the linear regression, as shown below:

Normal Q-Q Plot

What I would like to know is simple and is stated in the title of the post - is it possible to use the identify function for such a plot like this Q-Q plot? The three numbered observations in the plot are automatically selected by R and not by me. If it is possible, please show and explain the code(s) I should type. For example, how can I identify the point immediately to the left of the 90th observation if I wish to?

P.S. I apologise in advance if this is something trivial, but I have only been using R for about a month and this is already beyond the scope of what I have learnt :)

Ethan Mark
  • 293
  • 1
  • 9
  • 1
    It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input that can be used to test and verify possible solutions. You cannot use identify directly with the default plot method, but you can use `ggnorm` to make your own plot of the residuals. Basically follow the steps at: https://stackoverflow.com/questions/49547679/how-to-identify-a-datapoint-in-a-qqplot. you just need to do: `qqpoints <- qqnorm(resid(mammal.lm.1))` – MrFlick Apr 15 '21 at 06:44

1 Answers1

2

It is possible to do what you want by computing the coordinates separately from the plot. First we need reproducible data since you did not provide any. The data set mtcars comes with R (as do many other data sets):

data(mtcars)
log_hp <- log(mtcars$hp)
mpg.lm <- lm(mpg~log_hp, mtcars)

We have computed a linear regression for mpg (miles per gallon) from the log of hp (horsepower). The command plot(mpg.lm) will call a special version of the plot command, plot.lm, and prepare 4 plots. By reading the manual page at ?plot.lm we can see that the plot you want is the 2nd and we can access that plot with the following:

plot(mpg.lm, which=2)

Now we need the standardized residuals and the theoretical quantiles:

mpg.res <- rstandard(mpg.lm)
out <- qqnorm(mpg.res, plot.it=FALSE)
coords <- cbind(x=out$x, y=out$y)

The matrix coords has the quantiles and the standardized residuals and the row names are the cars. That gives us everything we want to identify points on the plot. I'll make the identified points red:

identify(coords, labels=rownames(coords), cex=.75, col="red")

QQ Plot

dcarlson
  • 10,936
  • 2
  • 15
  • 18
  • Thank you for your answer and my apologies for not attaching the dataset! You have indeed solved my problem! However, I note that when I manually generate the QQ plot, it does not come with the $45$ degree dotted line, as compared to the automatically generated QQ plot. May I know if it is possible to add this line in and if so, how? – Ethan Mark Apr 16 '21 at 06:56
  • 1
    I get the line with `plot(mpg.lm, which=2)` as on the image above. – dcarlson Apr 16 '21 at 14:09
  • I see. Yes, I missed that out. Thank you for your help! – Ethan Mark Apr 16 '21 at 14:36