I am analyzing real estate data in which year of sale is coded as a factor from 2010-2018. When running an OLS model, 2010 is left out automatically as it should be, but coefficients etc. for 2018 are indicated as NA. alias(ols) tells me that year2018 is the problem, but I have no indications that year2018 is actually perfectly correlated with any other variable. How do I figure out what the problem is?
I've made sure to check whether year is coded correctly as a factor and it is.
The OLS output for years 2010 to 2018 shows
PSUB1$year2011 -1.598e-01 1.755e-02 -9.105 < 2e-16 ***
PSUB1$year2012 -2.060e-01 1.573e-02 -13.101 < 2e-16 ***
PSUB1$year2013 -1.807e-01 1.400e-02 -12.908 < 2e-16 ***
PSUB1$year2014 -1.402e-01 1.341e-02 -10.462 < 2e-16 ***
PSUB1$year2015 -1.250e-01 1.284e-02 -9.739 < 2e-16 ***
PSUB1$year2016 -9.490e-02 1.249e-02 -7.595 3.86e-14 ***
PSUB1$year2017 -4.511e-02 1.272e-02 -3.546 0.000396 ***
PSUB1$year2018 NA NA NA NA
2010 is taken out as it should be, but 2018 is all NA.
I used
alias(ols)
and
ld.vars <- attributes(alias(ols)$Complete)$dimnames[[1]]
to identify that PSUB1$year2018 is the issue, but it doesn't tell me what it is perfectly collinear with. 2010 is taken out so the factor vector itself shouldn't be the problem.
I would expect PSUB1$year2018 to produce a coefficient and standard error like the other binaries. This is only a problem with this factor vector; there are other factor vectors in the model that work just fine.