merge residuals to data with missing obs na.exclude not working

Question

I have the following command to estimate the residuals from a linear regression and then merge it to the data. The data has missing values in the variable of interest but as I'm doing it on many variable I would like the residuals of my regression for NA values to just be NA.

I've seen on text that na.action=na.exclude should do the work just fine but it doesn't seems to do anything. Would you have an idea of why na.action is not working?


d <- lm(log(br_prix) ~ sectoryear, data=test,na.action = na.exclude)

P1 <- as.data.frame(d$residuals)

P1 <- P1 %>% dplyr::rename(lnprice = "d$residuals")

test <- cbind(test,P1)

Minor note: `P1 <- as.data.frame(d$residuals); P1 <- P1 %>% dplyr::rename(lnprice = "d$residuals")` is a long way to write `P1 <- data.frame(lnprice = d$residuals)` — Gregor Thomas, Apr 18 '23 at 18:57
Please provide enough code so others can better understand or reproduce the problem. — Community, Apr 19 '23 at 08:29
Thanks for the shorter command to rename and create a data frame! The predict() method however gives values to the observations with NA. — Xavier Koch, Apr 19 '23 at 09:06

score 2 · Answer 1 · answered Apr 18 '23 at 22:30

As the cbind() requires vectors to be the same length, we expect throwing error when joining data with missing values and residuals. luckily, the residuals of lm object have name attributes which are the row names of non-missing values of original data (test). Using the attributes we could join the residuals to the data, filling corresponding missing rows with NA in the residual column. I created a simple data frame containing two missing values to indicate the solution:

x <- c(4, 7, 8, 6, 9, 9.8)
y <- c(5, 8, NA, 8.7, NA, 9.8)

test <- data.frame(x,y)
test
    x   y
1 4.0 5.0
2 7.0 8.0
3 8.0  NA
4 6.0 8.7
5 9.0  NA
6 9.8 9.8

d <- lm(y ~ x, data = test)
attributes(d$residuals)
$names
[1] "1" "2" "4" "6"

we can see that rows 3 and 5 are dropped from residuals as they are missing in the test data. Using the row indices, we can join the residuals to the data and giving an optional name to residuals column in the data (here I renames "resid"):

 test [attr(d$residuals , which = "name"), "resid"] <- d$residuals
 test
    x   y      resid
1 4.0 5.0 -0.8376430
2 7.0 8.0 -0.1013730
3 8.0  NA         NA
4 6.0 8.7  1.3532037
5 9.0  NA         NA
6 9.8 9.8 -0.4141876

Thank you for your answer it works very well! I love the nice description also! :) — Xavier Koch, Apr 19 '23 at 09:07

merge residuals to data with missing obs na.exclude not working

1 Answers1