0

I am running a linear regression model. I have 33 continuous explanatory variables. The result of linear regression is:

ESTF<-lm(log(HousePrice_2$price.yen.m2.)~.,data = HousePrice_2)
Call:
    lm(formula = log(HousePrice_2$V1) ~ E1 + E3 + E4 + E5 + E6 + 
        E7 + E9 + E11 + E12 + E13 + E14 + E15 + E17 + E18 + E19 + 
        E21 + E22 + E23 + E24 + E25 + E26 + E27 + E28 + E29 + E30 + 
        E31 + E34 + E35 + E36 + E37 + E38 + E39 + E45, data = HousePrice_2)

Residuals:
        Min          1Q      Median          3Q         Max 
-0.98457132 -0.20283176 -0.01132873  0.21072971  1.02592116 

Coefficients:
                 Estimate    Std. Error  t value   Pr(>|t|)    
(Intercept)  7.478146e+01  1.158189e+01  6.45676 1.4073e-10 ***
E1          -1.678420e+08  2.282162e+07 -7.35452 3.0192e-13 ***
E3           2.481657e+08  3.196710e+07  7.76316 1.4518e-14 ***
E4           1.048053e+07  1.342064e+06  7.80926 1.0214e-14 ***
E5           1.154938e+07  1.521833e+06  7.58912 5.3834e-14 ***
E6           2.047969e+07  2.895253e+06  7.07354 2.2325e-12 ***
E7           3.129394e+08  4.934785e+07  6.34150 2.9386e-10 ***
E9           2.332690e+06  5.895178e+05  3.95694 7.9170e-05 ***
E11         -2.734790e+07  4.359309e+06 -6.27345 4.5132e-10 ***
E12         -4.761917e+08  7.589544e+07 -6.27431 4.4888e-10 ***
E13         -1.770340e+06  4.659259e+05 -3.79962 0.00015024 ***
E14         -1.210883e+06  2.333111e+05 -5.19000 2.3664e-07 ***
E15         -2.131764e+07  3.831284e+06 -5.56410 3.0746e-08 ***
E17          2.540183e+07  3.647269e+06  6.96462 4.7576e-12 ***
E18          6.851275e+08  9.627961e+07  7.11602 1.6573e-12 ***
E19          2.249070e+08  2.879451e+07  7.81076 1.0097e-14 ***
E21         -1.927894e+07  2.474312e+06 -7.79164 1.1686e-14 ***
E22         -1.602739e+08  2.049514e+07 -7.82009 9.4013e-15 ***
E23          7.541001e+08  9.874725e+07  7.63667 3.7732e-14 ***
E24         -9.934404e+08  1.268787e+08 -7.82984 8.7246e-15 ***
E25         -1.698081e+00  1.034052e+00 -1.64216 0.10074917    
E26          9.775790e+08  1.248296e+08  7.83131 8.6272e-15 ***
E27         -3.644899e+09  4.655646e+08 -7.82899 8.7820e-15 ***
E28          1.247317e+08  1.587659e+07  7.85633 7.1192e-15 ***
E29         -3.710396e+09  4.740074e+08 -7.82772 8.8679e-15 ***
E30          1.053938e+10  1.346325e+09  7.82826 8.8313e-15 ***
E31          9.306411e+09  1.188832e+09  7.82820 8.8354e-15 ***
E34         -5.903855e+08  7.572487e+07 -7.79645 1.1264e-14 ***
E35          3.237143e+08  4.148075e+07  7.80396 1.0636e-14 ***
E36         -2.877417e+06  3.704119e+05 -7.76815 1.3976e-14 ***
E37          1.111132e+08  1.430957e+07  7.76496 1.4320e-14 ***
E38          2.427448e+00  9.945393e-01  2.44078 0.01476142 *  
E39         -1.856438e+07  2.405426e+06 -7.71771 2.0495e-14 ***
E45         -1.006570e+05  1.299734e+04 -7.74443 1.6738e-14 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.3303115 on 1632 degrees of freedom
Multiple R-squared:  0.5202942, Adjusted R-squared:  0.5105943 
F-statistic: 53.63895 on 33 and 1632 DF,  p-value: < 2.2204e-16

There may be multicollinearity, so I used vif() and alias(). However, vif() returned NAN and alias returned nothing but the model:

> vif(ESTF)
 E1  E3  E4  E5  E6  E7  E9 E11 E12 E13 E14 E15 E17 E18 E19 E21 E22 E23 E24 E25 E26 E27 E28 E29 E30 E31 E34 E35 
NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 
E36 E37 E38 E39 E45 
NaN NaN NaN NaN NaN

> alias(ESTF)
Model :
log(HousePrice_2$V1) ~ E1 + E3 + E4 + E5 + E6 + E7 + E9 + E11 + 
    E12 + E13 + E14 + E15 + E17 + E18 + E19 + E21 + E22 + E23 + 
    E24 + E25 + E26 + E27 + E28 + E29 + E30 + E31 + E34 + E35 + 
    E36 + E37 + E38 + E39 + E45

I'm new to R, could anyone explain this and help me to select variables?

Zhan
  • 1
  • 2
  • Hi, Can you please share with us your data such that we can re-run your analysis and help you better? You could run `dput(HousePrice_2)` and paste the result into the question. – BrianLang Oct 07 '20 at 16:14
  • @BrianLang Thanks for your kind reply! But the dataset includes 1666 samples, which is so large that dput() cannot give the whole data. I was finding ways to post the whole data. – Zhan Oct 08 '20 at 00:20
  • Hi Zhan, You could try subsetting the data before you call `dput`. You would first: `set.seed(1234)` so we get the same data as you. Then you would calculate which rows to subset (maybe 20% of your data?) `which_rows <- sample(1:nrow(HousePrice_2), size = round(nrow(HousePrice_2)*.2)` and then you could `dput(HousePrice_2[which_rows,])`. – BrianLang Oct 08 '20 at 06:26
  • Do the answers you find [here](https://stackoverflow.com/questions/40167425/error-in-calculating-vif-variance-inflation-factor) help any? – BrianLang Oct 08 '20 at 06:29
  • This may indicate a partial aliasing. Try running alias() with partial = TRUE. – Bryan Nov 26 '21 at 16:53

0 Answers0