I have a data.frame
with several hundred variables that contains missing values that are denoted by NA. There are 571 observations in total. I'm only interested in 20 of the variables in this data.frame
. In other words, I want to define a complete observation as an any observation that has data in all 20 variables of interest.
One way of getting around this is by running a linear regression, which will delete any observations that have missing values. I.e. it will state something like:
(196 observations deleted due to missingness)
This will allow me to infer that my sample size is equal to 571 minus 196. But there must be a better way to do it. Any ideas?
Thank you in advance!