1

I have a data frame structured like this:

set.seed(123)
data<- data.frame(
  ID=factor(letters[seq(20)]),
  Location = rep(c("alph","brav", "char","delt"), each = 5),
  Var1 = rnorm(20),
  Var2 = rnorm(20),
  Var3 = rnorm(20)
)  

I have built a linear model: mod1 <- lm(Var1~Location,mydata). When I use: plot(mod1) on the linear model object, outliers are labeled with the index of the value. Is there a way to label those points with the value in ID? In other words, in this example values 6, 16, and 18 are labeled in the plots, and I want to them to be labeled with f, p, and r, respectively, because those are their corresponding values in ID

Ryan
  • 1,048
  • 7
  • 14
  • Can you provide a more detailed and reproducible example of your dataset ? (see this link: https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) – dc37 Feb 26 '20 at 19:49

1 Answers1

1

stats:::plot.lm is used to plot the diagnostic plots, and there are two options:

id.n: number of points to be labelled in each plot, starting with
      the most extreme.

labels.id: vector of labels, from which the labels for extreme points
          will be chosen.  ‘NULL’ uses observation numbers.

By default id.n=3, so they always label the 3 observations with the largest cook's distance. I am including this as part of the answer because you might want to be careful about interpreting them as outliers.

To get these points, you do

mod1 <- lm(Var1~Location,data)
outl = order(-cooks.distance(mod1))[1:3]
outl
[1] 18  6 16

To plot, you can either provide the labels.id the ID you want, or you start from scratch:

par(mfrow=c(1,2))
plot(mod1,which=1,labels.id =data$ID)
plot(fitted(mod1),residuals(mod1))
panel.smooth(fitted(mod1),residuals(mod1))
text(fitted(mod1)[outl]+0.01,residuals(mod1)[outl],
data$ID[outl],col="red")

enter image description here

To go through all the plots, do:

plot(mod1,labels.id=data$ID)
StupidWolf
  • 45,075
  • 17
  • 40
  • 72