2

Is there a way to include categorical variables (factors with several factor levels) when using plm() for pooled OLS? As far as I understand, in plm() all variables have to be numeric, which will not work in my case. I could include one dummy variable for each factor level, however, this would lead to a larger number of variables which actually are only levels of considerably fewer factors.

I've posed a similar question on CrossValidated and would be thankful for any kind of help.

I will include a minimal example if requested, but I assume this is more a general question on how to use plm() and lm().

Community
  • 1
  • 1
Aki
  • 409
  • 2
  • 6
  • 15

1 Answers1

1

You can easily include both numeric and categorical variables variables in both plm() and lm().

require(plm)
data(Males)
head(Males[1:6])
# nr year school exper union  ethn
# 1 13 1980     14     1    no other
# 2 13 1981     14     2   yes other
# 3 13 1982     14     3    no other
# 4 13 1983     14     4    no other
# 5 13 1984     14     5    no other
# 6 13 1985     14     6    no other

coef(lm(wage ~ school + union + ethn, data=Males))
# (Intercept)      school    unionyes   ethnblack    ethnhisp 
# 0.7148      0.0767      0.1930     -0.1523      0.0134 

coef(plm(wage ~ school + union + ethn, data=Males, model="pooling"))
# (Intercept)      school    unionyes   ethnblack    ethnhisp 
# 0.7148      0.0767      0.1930     -0.1523      0.0134 

As you can see, you can have both dummy and categorical variables in both instances.

landroni
  • 2,902
  • 1
  • 32
  • 39
  • Thank you. I've read somewhere that it would not work and I used to get an error message when including factors, but obviously the problem was something else... – Aki Jan 15 '15 at 20:21
  • I would suspect that you're hitting your head against factor contrasts. See http://stackoverflow.com/questions/3445316/factors-in-r-more-than-an-annoyance and http://stackoverflow.com/questions/2352617/how-and-why-do-you-use-contrasts-in-r. – landroni Jan 15 '15 at 20:24