5

I am looking for a method to bind lm residuals to an input dataset. The method must add NA for missing residuals and the residuals should correspond to the proper row.

Sample data:

N <- 100 
Nrep <- 5 
X <- runif(N, 0, 10) 
Y <- 6 + 2*X + rnorm(N, 0, 1) 
X[ sample(which(Y < 15), Nrep) ] <- NA
df <- data.frame(X,Y)

residuals(lm(Y ~ X,data=df,na.action=na.omit))

Residuals should be bound to df.

metasequoia
  • 7,014
  • 5
  • 41
  • 54
  • Similar questions [here](http://stackoverflow.com/q/6882709/684229) and [here](http://stats.stackexchange.com/questions/11000/how-does-r-handle-missing-values-in-lm). – Tomas Jul 31 '13 at 18:26

5 Answers5

10

Simply change the na.action to na.exclude:

residuals(lm(Y ~ X, data = df, na.action = na.exclude))

na.omit and na.exclude both do casewise deletion with respect to both predictors and criterions. They only differ in that extractor functions like residuals() or fitted() will pad their output with NAs for the omitted cases with na.exclude, thus having an output of the same length as the input variables.

(this is the best solution found here)

Community
  • 1
  • 1
Tomas
  • 57,621
  • 49
  • 238
  • 373
  • This is the general solution you're looking for, the one that works with missings in any number of predictors or DV, with lm and lme4. – Ruben Dec 08 '14 at 11:40
2

Using merge, or join.

N <- 100 
Nrep <- 5 
X <- runif(N, 0, 10) 
Y <- 6 + 2*X + rnorm(N, 0, 1) 
X[ sample(which(Y < 15), Nrep) ] <- NA
df <- data.frame(X,Y)

df$id <- rownames(df)

res <- residuals(lm(Y ~ X,data=df,na.action=na.omit))
tmp <- data.frame(res=res)
tmp$id <- names(res)

merge(df,tmp,by="id",sort=FALSE,all.x=TRUE)

If you need to maintain the order. Use join() from the plyr package:

library(plyr) 
join(df,tmp)
Brandon Bertelsen
  • 43,807
  • 34
  • 160
  • 255
0

This maybe could be solution, but, first, you do not need c() in data.frame

df <- data.frame(X,Y)
df$Res[!is.na(X)]<-residuals(lm(Y ~ X,data=df,na.action=na.omit))
Didzis Elferts
  • 95,661
  • 14
  • 264
  • 201
0
"[<-"(df, !is.na(df$X), "res", residuals(lm(Y ~ X,data=df,na.action=na.omit)))

will do the trick.

Sven Hohenstein
  • 80,497
  • 17
  • 145
  • 168
0
N <- 100 
Nrep <- 5 
X <- runif(N, 0, 10) 
Y <- 6 + 2*X + rnorm(N, 0, 1) 
X[ sample(which(Y < 15), Nrep) ] <- NA
df <- data.frame(X,Y)

R.all=as.numeric(rep(NA,length(X)))  # numeric vector with missing values
res=residuals(lm(Y ~ X,data=df,na.action=na.omit))  
i=as.numeric(names(res)) # vector locations of non-missing residuals
R.all[i]=res;R.all     # assign residuals to their correct positions.