1

I have the following command to estimate the residuals from a linear regression and then merge it to the data. The data has missing values in the variable of interest but as I'm doing it on many variable I would like the residuals of my regression for NA values to just be NA.

I've seen on text that na.action=na.exclude should do the work just fine but it doesn't seems to do anything. Would you have an idea of why na.action is not working?


d <- lm(log(br_prix) ~ sectoryear, data=test,na.action = na.exclude)

P1 <- as.data.frame(d$residuals)

P1 <- P1 %>% dplyr::rename(lnprice = "d$residuals")

test <- cbind(test,P1)
QHarr
  • 83,427
  • 12
  • 54
  • 101
  • Minor note: `P1 <- as.data.frame(d$residuals); P1 <- P1 %>% dplyr::rename(lnprice = "d$residuals")` is a long way to write `P1 <- data.frame(lnprice = d$residuals)` – Gregor Thomas Apr 18 '23 at 18:57
  • Please provide enough code so others can better understand or reproduce the problem. – Community Apr 19 '23 at 08:29
  • 1
    Thanks for the shorter command to rename and create a data frame! The predict() method however gives values to the observations with NA. – Xavier Koch Apr 19 '23 at 09:06

1 Answers1

2

As the cbind() requires vectors to be the same length, we expect throwing error when joining data with missing values and residuals. luckily, the residuals of lm object have name attributes which are the row names of non-missing values of original data (test). Using the attributes we could join the residuals to the data, filling corresponding missing rows with NA in the residual column. I created a simple data frame containing two missing values to indicate the solution:

x <- c(4, 7, 8, 6, 9, 9.8)
y <- c(5, 8, NA, 8.7, NA, 9.8)

test <- data.frame(x,y)
test
    x   y
1 4.0 5.0
2 7.0 8.0
3 8.0  NA
4 6.0 8.7
5 9.0  NA
6 9.8 9.8

d <- lm(y ~ x, data = test)
attributes(d$residuals)
$names
[1] "1" "2" "4" "6" 

we can see that rows 3 and 5 are dropped from residuals as they are missing in the test data. Using the row indices, we can join the residuals to the data and giving an optional name to residuals column in the data (here I renames "resid"):

 test [attr(d$residuals , which = "name"), "resid"] <- d$residuals
 test
    x   y      resid
1 4.0 5.0 -0.8376430
2 7.0 8.0 -0.1013730
3 8.0  NA         NA
4 6.0 8.7  1.3532037
5 9.0  NA         NA
6 9.8 9.8 -0.4141876
S-SHAAF
  • 1,863
  • 2
  • 5
  • 14