1

I have a large dataset and ran a multiple regression with a large number of (but not all) available variables. I am trying to run a simple regression for comparison and need to use the same observations as those in the multiple regression.

What is the easiest/best way to do this? I was thinking I could create a subset containing just complete observations on the variables in the multiple regression and run both the multiple regression and simple regression on that subset, but I can't figure out how to do that.

Perhaps there is an even easier way to just identify and 'select' the observations used in the multiple regression?

I have done some extensive googling on the subject but can't find a solution so far.

AKJ
  • 47
  • 5
  • Welcome to StackOverflow. Can you provide a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) and provide any code you've written so far and your dataset using `dput()` (or a portion of your data or a sample dataset)? – jrcalabrese Dec 10 '22 at 15:45

1 Answers1

0

You can accomplish this by using the function model.frame(). See ?model.frame.

model.frame (a generic function) and its methods return a data.frame with the variables needed to use formula and any ... arguments.

library(dplyr)
data(storms)
nrow(storms) # pretty big
#> [1] 11859

# multiple regression
fit1 <- lm(pressure ~ wind + year + month + hurricane_force_diameter, data = storms)
df_used_in_fit1 <- model.frame(fit1) %>% as.data.frame()
nrow(df_used_in_fit1) # smaller because of NA values
#> [1] 5350

# simpler regression
fit2 <- lm(pressure ~ wind, data = df_used_in_fit1)
nrow(model.frame(fit2))
#> [1] 5350

Note that model.frame will only include variables that we used in the original lm model.

names(df_used_in_fit1)
[1] "pressure"                 "wind"                     "year"                    
[4] "month"                    "hurricane_force_diameter"
jrcalabrese
  • 2,184
  • 3
  • 10
  • 30
  • That was exactly what I was looking for. Thank you so much - this issue was driving me crazy. – AKJ Dec 10 '22 at 17:49