4

I'm working on a state and year fixed effects regression, which has 3 observations per state/year combo based on the race for that row (white, black, other) - See link below.
So far, I've been using the base lm function to estimate a fixed effects regression that accounts for all three races. I do this by using state, year and race all as factor variables. I am also running separate regressions for each individual race. The problem is that I would prefer to use the plm package so that i can get the within r-squared for the model with all races, however it is giving me errors.

Edit: I included a picture of my data here the data is a balanced panel, there are 34 states, 12 years (2003-2014) and 3 races for each state/year combo so a total of 1244 observations.

Here is the code I'm using to run the plm regression:

#plm regression
plm.reg <- plm(drugcrime_ar ~ decrim_dummy + median_income + factor(race),
               data = my.data, index=c("st_name","year"), model = "within",
               effect = "twoways")

The errors I get in return:

Error in pdim.default(index[[1]], index[[2]]): 
   duplicate couples (id-time) 
In addition: Warning messages: 
1: In pdata.frame(data, index) :
   duplicate couples (id-time) in resulting pdata.frame
   to find out which, use e.g. table(index(your_pdataframe), useNA = "ifany"
2: In is.pbalanced.default(index[[1]], index[[2]]) :
   duplicate couples (id-time)
 3: In is.pbalanced.default(index[[1]], index[[2]]) :
   duplicate couples (id-time)  ` 

Is there a workaround for this or am I out of luck?

dmunslow
  • 149
  • 1
  • 7
  • Could you put a reproducible example? e.g.: http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – Edgar Santos Apr 20 '17 at 04:13
  • Show the layout of your data and how you create the pdata.frame and the estimation. – Helix123 Apr 20 '17 at 06:56
  • I edited my post and added the information you requested – dmunslow Apr 20 '17 at 16:39
  • It seems to me, you actually have some kind of nested panel structure. The development version of plm implements the nested model as in Baltagi/Song/Jung (2001) but I do not know if it is suitable for your situation. – Helix123 Apr 21 '17 at 12:46

1 Answers1

5

The plm function needs just one pair of id/time. For each id you supplied you have more than one year.

If each st_name and race pairs form an "individual" (or whatever the name you give to this dimension of the panel), then you could do:

library(dplyr)

my.data$id <- group_indices(my.data, st_name, race)    
#which would be the same as my.data <- my.data %>% mutate(id = group_indices(st_name, race)), if this function supported mutate. 

plm.reg <- plm(drugcrime_ar ~ decrim_dummy + median_income + factor(race),
           data = my.data, index=c("id","year"), model = "within",
           effect = "twoways")

See, however, that in this situation you are not using a kind of nested panel structure as @Helix123 suggested. You are only redefining the first dimension of the panel.

Rodrigo Remedio
  • 640
  • 6
  • 20
  • This did exactly what I wanted it to! I had to slightly modify the call to group_indices as the first argument is the data frame and the groups come after - so my call was my.data$id <- group_indices(my.data, st_name, race). Thank you! – dmunslow Apr 24 '17 at 06:21