1

I understand from this question here that coefficients are the same whether we use a lm regression with as.factor() and a plm regression with fixed effects.

N <- 10000
df <- data.frame(a = rnorm(N), b = rnorm(N),
                 region = rep(1:100, each = 100), year = rep(1:100, 100))
df$y <- 2 * df$a - 1.5 * df$b + rnorm(N)


model.a <- lm(y ~ a + b + factor(year) + factor(region), data = df)
summary(model.a)
#  (Intercept)       -0.0522691  0.1422052   -0.368   0.7132    
#  a                  1.9982165  0.0101501  196.866   <2e-16 ***
#  b                 -1.4787359  0.0101666 -145.450   <2e-16 ***

library(plm)
pdf <- pdata.frame(df, index = c("region", "year"))

model.b <- plm(y ~ a + b, data = pdf, model = "within", effect = "twoways")
summary(model.b)

# Coefficients :
#    Estimate Std. Error t-value  Pr(>|t|)    
# a  1.998217   0.010150  196.87 < 2.2e-16 ***
# b -1.478736   0.010167 -145.45 < 2.2e-16 ***

library(lfe)

model.c <- felm(y ~ a + b | factor(region) + factor(year), data = df)
summary(model.c)

# Coefficients:
#   Estimate Std. Error t value Pr(>|t|)    
# a  1.99822    0.01015   196.9   <2e-16 ***
# b -1.47874    0.01017  -145.4   <2e-16 ***

However, the R and R-squared differ significantly. Which one is correct and how does the interpretation changes between the two models? In my case, the R-squared is much larger for the plm specification and is even negative for the lm + factor one.

Mat36
  • 33
  • 4
  • Suggest you use smaller data and the compare the model matrices. Also note that you need set.seed at top in order to make this reproducible. – G. Grothendieck Mar 23 '21 at 13:37
  • Thanks, this data is only for illustrative purpose and is not the primary data of my research. – Mat36 Mar 23 '21 at 13:39
  • Please look at https://stackoverflow.com/questions/49058092/plm-vs-lm-different-results and https://stackoverflow.com/questions/47713727/degrees-of-freedom-panel-data-fixed-effects-plm – Helix123 Mar 23 '21 at 22:18

0 Answers0