Efficient creation of dummy variables in fixed effects regression

Question

I have time panel data of 34 countries that describes on which days they have committed to giving military aid in €. I am running a fixed effects regression to study how the sum of this aid changes over time depending on an independent dummy variable that measures the recipient use of this military aid as successful (1) or not visibly successful (0). This independent variable relies on the date column.

To be clear, I am want to do a regression with state fixed effects.

Since I'm measuring this in the time unit days, my problem is that I believe the plm function needs me to assign a dummy variable for each day each country has not given any military aid, meaning that I need 365 dummy variables per year for each of the 34 donator countries.

Since the plm function does not interpret NA values, I have had to transform the "empty" days without committed aid as "none". However this causes the problem of R interpreting this as a state of its own that never gives any aid.

Currently, my dataset looks like this:

State	an_date	val_eur
Belgium	22/02/26	7600000
Slovakia	22/02/26	11000000
none	22/02/27	0

Subsequently, when I run this plm model the results are insignificant and the coefficient goes in the opposite direction of what is expected from previous data. t_sq is a squared time control variable.

plm(val_eur ~ success + t_sq, index="state", model="within", data=df)

I would highly appreciate any ideas as to how to create, or make R interpret all the required dummy variables for the regression!

I have tried looking inside the plm function for ways to make it create dummy variables the same way it creates dummy variables for the country fixed effects (by using index="state"), but I have not found any way.

Manually coding the dataset and adding approx 34*365 dummy variables seems like a bit of a coding nightmare.

EDIT, Some more info: when I use factor() to group by days, I get this error message "non-unique values when setting 'row.names'" as several countries commit aid on some dates.

MRE BELOW

Note that for some reason I get an error message when I try this plm regression saying that the model is empty. I do not get this error message in the original model.

#Creating some base example data 
state <- c("Belgium","Slovakia","NA")
an_date <- as.Date(c("26/02/2022","26/02/2022","27/02/2022"), format = "%d/%m/%Y")
as.Date("6/30/2016", format = "%m/%d/%Y")
val_eur <- c(7600000, 11000000, 0)
df <- data.frame(state, an_date, val_eur)

#Creation of a variable telling amount of days since invasion 
inv_date <- as.Date("2022-02-24")
df$t <- difftime(df$an_date,inv_date, units ="days")
#creation of a square time control variable for the regression. 
df$t = as.numeric(df$t)
df$t_sq <- df$t^2

#Creating a time interval that the independent dummy variable uses. 
#bse means "battlefield success effects" and marks a 30 day time period 
#adding a time period for which the ind. var takes the value 1. 
bse <- interval(ymd("2022-02-27"), ymd("2022-03-04"))
df$bse <- df$an_date %within% bse
#Translating the TRUE/FALSE values to a dummy column for battlefield success effects 
df$bse <- as.integer(df$bse)

#attempt at regression
library(plm)

fe_mod <- plm(val_eur ~ bse + t, index=c("state"),
              model="within", data=df)

I think you need to adjust your data so that you will have a row with a country name, date and val_eur=0 for all day-country pairs where the country has not given any aid. Other than that, I would recommend you look at function feols from package fixest for more flexible fixed effects OLS implementation. — Otto Kässi, Apr 18 '23 at 11:37
Also please note that your question does not include a reproducible example. Have a look at this question https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example for instructions on how to create one. At least include your raw data, and the transformations you have made. — Otto Kässi, Apr 18 '23 at 11:40
@OttoKässi Thanks for your input. I've added an MRE now that should help to show it further. Let me know if anything in it is lacking or not intuitive enough. You're correct about me needing to adjust the data so that there is a row with country name, date and val_eur=0 for all day-country pairs where the country has not given any aid. My problem/question is how I would go about doing this as there are 34 countries in the original dataset, spanning a time period of over a year? I've looked at feols, but I have not seen any way it would help me solve this particular problem. — Elmo_, Apr 18 '23 at 13:21
You are not clear if you want time fixed effects, state fixed effects or both? Also note, that your model will not be identified with the data you provide above because you have only 1 observation per year, and success is always 0. — Otto Kässi, Apr 19 '23 at 07:05
Thanks for the help, the success always being 0 is bound to be the problem. I am only interested in state fixed effects. The time variable is only included as a control variable and as a component of the independent variable as it only takes the value 1 in certain time periods. — Elmo_, Apr 19 '23 at 12:52
That's correct, the data in the MRE is not sufficient to make a FE regression. The data in the MRE is only there to show how the dataset is structured, which (this is the problem) either needs to be altered by manually adding a crazy amount of dummies, or finding a parameter in a plm() or feols() that can interpret certain time periods as dummies with less manual coding. The real dataset has 240 observations and 34 states. — Elmo_, Apr 19 '23 at 19:31

score 0 · Answer 1 · answered Apr 18 '23 at 18:28

0

I did not 100% understand what you are trying to do so i may be off track, but to me it sounds ridiculous to create hundreds of dummies for days in a year. Why not transfer the dates into a continous variable "number of days since a specific event" and use this number in ols regression? Eg. transform February 23rd (54th day of the year) into 54.

answered Apr 18 '23 at 18:28

ajj

55
6

A variable like that already exists (see df$t in the code), but the problem is that it cannot be used as an independent variable as the independent variable of interest is a dummy showing if the date of the aid commitment is the same as a certain time period (three to be exact but in the example above only the period 2022-03-29 to 2022-05-04). With a continious variable the regression would not see any difference between a "normal" day or a day within one of these time periods. – Elmo_ Apr 18 '23 at 19:32

Otto Kässi · Accepted Answer · 2023-04-20T11:29:04.290

Your issue might be that your panel is not balanced. Something along these lines might be helpful

#data
df <- structure(list(state = c("Belgium", "Slovakia", "NA"), an_date = structure(c(19049,
19049, 19050), class = "Date"), val_eur = c(7600000, 1.1e+07,
0), t = c(2, 2, 3), t_sq = c(4, 4, 9), bse = c(0L, 0L, 1L)), row.names = c(NA,
-3L), class = "data.frame")


# libraries
    library(lubridate)
    library(tidyverse)
    
## unique days and states
    df %>% filter(state != 'NA') %>% select(state) %>% unique() -> all_states
    df %>% select(an_date) %>% unique() -> all_dates
    
## expand to grid with all date/state combinations
    expand.grid(c(all_states, all_dates)) -> x

## spread df to balanced form and fill out NA's    
    x %>% left_join(df, by=c('an_date','state'))  %>% 
           mutate(t = ifelse(is.na(t), as.Date(an_date) -  as.Date('2022-02-24'),t),  
                  t_sq = ifelse(is.na(t_sq), as.integer((as.Date(an_date) -  as.Date('2022-02-24')))^2,t_sq),  
                  val_eur = ifelse(is.na(val_eur), 0, val_eur), 
                  bse = ifelse((an_date >= as.Date("2022-02-27") & an_date <= as.Date("2022-03-04")), 1,0))  ->  
balanced_panel_df

balanced panel looks as follows:

> balanced_panel_df
     state    an_date val_eur t t_sq bse
1  Belgium 2022-02-26 7.6e+06 2    4   0
2 Slovakia 2022-02-26 1.1e+07 2    4   0
3  Belgium 2022-02-27 0.0e+00 3    9   1
4 Slovakia 2022-02-27 0.0e+00 3    9   1

Here's how you could run a regression

library(fixest)
feols(val_eur ~ bse + t | state, data=balanced_panel_df)

If you really want to do time fixed effects, you can use

balanced_panel_df$t <- as.factor(balanced_panel_df$t)
feols(val_eur ~ bse| state + t, data=balanced_panel_df)

I do not understand what the contents of the BSE column is supposed to show. Also, I wasn't sure what the desired regression specification is, `val_eur ~ bse + t` or `val_eur ~ success + t_sq`? — Otto Kässi, Apr 20 '23 at 11:25
First off, a big thank you for your answer. The balancing of the datasheet was exactly what was necessary. To answer the question about the desired regression specification, both. success/bse are actually exactly the same, it was my fault for using both. I thought success might be more intuitive but then I obviously ended up using BSE anyways. The BSE column simply measures certain dates that are characterized by Ukrainian battlefield successes, but the example time period I put in the MRE is not actually representative of the real dates used. — Elmo_, Apr 21 '23 at 08:52

Efficient creation of dummy variables in fixed effects regression

2 Answers2