0

I have a data set on evacuation that is essentially:

Start End Evac_num date_time
loc_1 loc_2 2000 30-09-2020 16:00

Where start is the starting location ID, end is where they evacuate to (end location ID), the number of evacuees and the date and time that this was recorded. Start and End combinations are repeated for various date/times.

I ran some OLS regressions with

r1<- lm(y ~ x, data=df)

as well as fixed effects models with

fe1 <- felm(y ~ x | date_time, data=df)

and found my data was heteroskedastic after running a Breusch-Pagan test. I have decided to then do some Generalised Least Square (GLS) models to account for this issue, which works well for the OLS models, however I do not know how to add in date_time fixed effects.

For the GLS models I did:

df$resi <- r1$residuals

varfunc.ols1 <- lm(log(resi^2) ~ x, data = df)

df$varfunc <- exp(varfunc.ols1$fitted.values)

r1.gls <- lm(y ~ x, weights = 1/sqrt(varfunc), data = df)

summary(r1.gls)
summary(varfunc.ols3)

I'm not sure the best way to run a GLS model with Fixed Effects in R? I looked into the pggls command in the plm package with something like:

fgls_1 <- pggls(y~x, data=df, model="within", effect="time", index=c("Start", "date_time"))

I was getting this error from the above model:

Warning: duplicate couples (id-time) in resulting pdata.frame to find out which, use, e.g., table(index(your_pdataframe), useNA = "ifany")Error in pdim.default(index[[1L]], index[[2L]]) : duplicate couples (id-time)

To deal with this issue, I combined the Start and End IDs into a single column (location_id) whichis basically start.end (e.g. if start was 123 and end was 234, it's now 123.234) as I thought this repetition of the Start ID was causing my duplicated error, as shown below:

fgls_1 <- pggls(y~x, data=df, model="within", effect="time", index=c("location_id", "date_time"))

but now I am getting the error that "duplicated row-names are not allowed".

Does anyone have any idea how to handle this? Would it be better if I gave date/time seperate columns? Or am I thinking about adding fixed effects to GLS all wrong?

Helix123
  • 3,502
  • 2
  • 16
  • 36
  • You can fit Generalized Least Squares (GLS) with `nlme::gls`. – dipetkov Jul 31 '22 at 19:33
  • There are plenty of questions with answers about the douplicate couples message here on SO, e.g., https://stackoverflow.com/a/72092725/4640346. Also note that your `date_time` variable is likely in a format that might not work with `pggls` (it converts the index variables to factors). Best to go with intergers for the time dimension. Also easier to start with a conversion of your data to `pdata.frame` beforehand and look at the converted data first. – Helix123 Aug 07 '22 at 06:52

0 Answers0