14

I have a multiple regression model. I want to add the fitted values and residuals to the original data.frame as two new columns. How can I achieve that? My model in R is like this:

BD_lm <- lm(y ~ x1+x2+x3+x4+x5+x6, data=BD)
summary(BD)

I also got the fitted value

BD_fit<-fitted(BD_lm)

But I want to add this BD_fit values as a column to my original data BD. I don't know how. When I tried to call BD_fit, it just gave me a lot of numbers. I am running a large dataset, so it is hard to list all of them here.

Michael Ohlrogge
  • 10,559
  • 5
  • 48
  • 76
titi
  • 609
  • 2
  • 7
  • 9
  • See this Cross Validated post for useful information on handling predicted values when your regression uses a subset of your total data: https://stats.stackexchange.com/questions/11000/how-does-r-handle-missing-values-in-lm – Michael Ohlrogge May 10 '17 at 23:33

3 Answers3

26

Suppose:

fm <- lm(demand ~ Time, BOD)

Then try this:

cbind(BOD, resid = resid(fm), fitted = fitted(fm))

or this:

BOD$resid <- resid(fm)
BOD$fitted <- fitted(fm)

ADDED:

If you have NA values in demand then your fitted values and residuals will be of a different length than the number of rows of your data, meaning the above will not work. In such a case use: na.exclude like this:

BOD$demand[3] <- NA # set up test data
fm <- lm(demand ~ Time, BOD, na.action = na.exclude)

na.exclude will automatically pad the predictions and residuals with NA values so that they are of the same length as the original data. Now the previous lines should work.

Michael Ohlrogge
  • 10,559
  • 5
  • 48
  • 76
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
  • I tried what you suggested, but i got an error:"Error in data.frame(..., check.names = FALSE) : arguments imply differing number of rows: 355027, 53467" The 355027 is the row number of my original data, and the 53467, i'm not sure what is it. maybe the problem is my fitted values is not the same number as the original data? i'm still trying to figure out. – titi Sep 30 '13 at 21:57
  • Read this: http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – G. Grothendieck Sep 30 '13 at 22:59
  • @titi Do you have missing values in BD? You won't get a prediction for any records with missing values, which will make your vector of fitted values shorter than your original data frame. – Matt Parker Sep 30 '13 at 23:28
  • Yes, i figured out that it's because of the missing values. Now i got it. Thanks. – titi Oct 01 '13 at 19:28
  • Have added some info on NA handling. – G. Grothendieck Oct 01 '13 at 23:48
  • subtle point I was missing (h/t to [this](http://stackoverflow.com/questions/6882709/how-do-i-deal-with-nas-in-residuals-in-a-regression-in-r) answer for setting me straight) is that `fm$residuals` will still not return any `NA` with `na.action = na.exclude`; rather, it's the `resid` function that recognizes & interprets this attribute of `fm`. – MichaelChirico Mar 20 '17 at 19:14
0
BD_fit<-data.frame(BD_fit)
BD$fit<-BD_fit[1]
Metrics
  • 15,172
  • 7
  • 54
  • 83
-1

Despite not knowing your case in detail, adding to a data frame is quite easy. You could jsut add a new column like so:

df <- data.frame(var1=1:10)
df$var2 <- 11:20

You only have to make sure that your additional data columns have the same length as the original ones. Otherwise, you won't be able to add them to your data frame.

Florian R. Klein
  • 1,375
  • 2
  • 15
  • 32