There are a variety of ways you can do this. A simple strategy is to first save your residuals to the data.frame
as a new column. Then you can add a second new column to flag if a residual is an outlier or not. You can then use that column to either make a new data.frame
without outliers or subset your current data.frame
or whatever else you need. Here is an example:
set.seed(20) #sets the random number seed.
# Test data and test linear model
DF<-data.frame(X=rnorm(200), Y=rnorm(200), Z=rnorm(200))
LM<-lm(X~Y+Z, data=DF)
# Store the residuals as a new column in DF
DF$Resid<-resid(LM)
# Find out what 2 standard deviations is and save it to SD2
SD2<-2*sd(resid(LM))
SD2
#[1] 1.934118
# Make DF$Outs 1 if a residual is 2 st. deviations from the mean, 0 otherwise
DF$Outs<-ifelse(abs(DF$Resid)>SD2, 1, 0)
# Plot this, note that DF$Outs is used to set the color of the points.
plot(DF$Resid, col=DF$Outs+1, pch=16,ylim=c(-3,3))

#Make a new data.frame with no outliers
DF2<-DF[!DF$Outs,]
nrow(DF2)
#[1] 189 Was 200 before, 11 outliers removed
# Plot new data
plot(DF2$Resid, col=DF2$Outs+1,pch=16, ylim=c(-3,3))

That is the basic idea. You can combine some of these commands - you could just create the outliers column without saving SD2
for instance, and you don't really need two data.frames
- you could just exclude the outliers rows when you need to.