0

I wish to evaluate marginal effects of variables in a logit regression using a dataset like this (with 40k observations):

d1<- structure(list(dummy.eleito = c(1, 0, 0, 0, 0, 1, 1, 1, 1, 0), 
                     dummy.tratamento = c(1, 1, 0, 0, 0, 0, 0, 0, 0, 0), 
                     Escolaridade = c("SUPERIOR_INCOMPLETO", "FUNDAMENTAL_INCOMPLETO", 
                                      "SUPERIOR_COMPLETO", "FUNDAMENTAL_INCOMPLETO", 
                                     "SUPERIOR_COMPLETO", "SUPERIOR_COMPLETO", "SUPERIOR_INCOMPLETO", 
                                     "SUPERIOR_INCOMPLETO", "SUPERIOR_COMPLETO", "SUPERIOR_INCOMPLETO"), 
                     Raca = c("Preta_Parda", "Preta_Parda", "Preta_Parda", "Preta_Parda", 
                              "Preta_Parda", "Preta_Parda", "BRANCA", "BRANCA", "BRANCA", "BRANCA"),
                     DESCRICAO_SEXO = c("MASCULINO", "MASCULINO", "MASCULINO", 
                                        "MASCULINO", "MASCULINO", "MASCULINO", "MASCULINO", 
                                        "MASCULINO", "MASCULINO", "MASCULINO"), 
                     votos.cidade = c(6483, 6483, 6483, 6483, 6483, 6483, 4735, 
                                      4735, 4735, 4735), 
                     dummy.prefeito = c(0,1, 0, 0, 0, 1, 0, 0, 0, 1), 
                     Intensidade.Trat0.Mun = c(0.0152671755725191, 0.0152671755725191, 0.0152671755725191, 0.0152671751, 
                                               0.0152671755725191, 0.01526717, 0.02857142856, 0.028571428, 0.028571, 0.0285714), 
                     Var.Receitas = c(3.25607407, 11.424, 4.5549, -0.832116880227985, 5.78901737320675, -0.02459246, 
                                      1.151009, -0.3058719238, 0.742947247, -0.2711)), 
                .Names = c("dummy.eleito", "dummy.tratamento", "Escolaridade", "Raca", 
                           "DESCRICAO_SEXO", "votos.cidade", "dummy.prefeito", "Intensidade.Trat0.Mun", 
                           "Var.Receitas"), row.names = c(NA, 10L), class = "data.frame")

I run the following regression using glm:

model <- glm(dummy.eleito ~  dummy.tratamento + factor(Escolaridade) +
                       factor(Raca) + factor(DESCRICAO_SEXO) +
                       votos.cidade + dummy.prefeito +
                       dummy.tratamento:Intensidade.Trat0.Mun +
                       Var.Receitas + Var.Receitas:dummy.tratamento, 
                       data = d1, 
                       family = binomial(link = 'logit'))

Then I evaluate marginal effects at some points:

m <- margins(model, at = list(dummy.tratamento = 1,
                              Intensidade.Trat0.Mun = fivenum(d1$Intensidade.Trat0.Mun)                               
                              Var.Receitas = fivenum(d1$Var.Receitas))

R tried to run this through the whole night... at the morning, still nothing. Is that normal? Any possible reason? Is the data too complex? Or maybe the regression formula itself? Even if I ran margins without using the at specification it still would not go.

Any help?


EDIT:

After updating R, to its newest version, this is what I got in the end:

Running the regressions I needed and the margins command using the entire dataset, R took time to do the job, but it did in the end.

However, the problem persisted when using the at parameter inside margins. I suspect it is because the regression has factor variables. I think I will probably calculate by hand predicted values of my dependent variable using the parameters that I would put inside the at command, just to get a grasp of the results.

Any suggested alternatives are welcome.

  • Including a [minimal reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) in your question will increase your chances of getting an answer. – Samuel Oct 19 '17 at 01:21
  • @jsb Just included one with a fraction of the original dataset – Arthur Carvalho Brito Oct 19 '17 at 01:54

2 Answers2

1

I think I have found the problem. Your code produced an error because you had a factor DESCRICAO_SEXO with only one level:

Error in contrasts<-(*tmp*, value = contr.funs[1 + isOF[nn]]) :
contrasts can be applied only to factors with 2 or more levels

Next, I suggest you create factors outside your glm call:

d1$dummy.eleito <- as.factor(d1$dummy.eleito)
d1$dummy.tratamento <- as.factor(d1$dummy.tratamento)
d1$Escolaridade <- as.factor(d1$Escolaridade)
d1$Raca <- as.factor(d1$Raca)
d1$DESCRICAO_SEXO <- as.factor(d1$DESCRICAO_SEXO)
d1$dummy.prefeito <- as.factor(d1$dummy.prefeito)

Running the following model (without DESCRICAO_SEXO) works:

model <- glm(dummy.eleito ~  dummy.tratamento + Escolaridade + 
 Raca + votos.cidade + dummy.prefeito + Intensidade.Trat0.Mun + 
   Var.Receitas, data = d1, family = binomial(link = 'logit'))

However, it still throws the following warning:

Warning message: glm.fit: fitted probabilities numerically 0 or 1 occurred

You can read about this warning here and here. This warning may only occur in the small dataset you have provided, not in the full dataset. You have to try and see.

Samuel
  • 2,895
  • 4
  • 30
  • 45
  • Running this regression with the full dataset I do not get this warning. However, my main issue persists: the `margins` line of code just won't run – Arthur Carvalho Brito Oct 19 '17 at 02:29
  • What error do you get? I'm not familiar with the `margins` function. – Samuel Oct 19 '17 at 03:05
  • `margins` is a function that belongs to the package with the same name. (https://cran.r-project.org/web/packages/margins/vignettes/TechnicalDetails.pdf). It is not actually an error: R just keeps forever trying to run the command but never ending it... not that it crashes, it just keeps on forever – Arthur Carvalho Brito Oct 19 '17 at 03:10
  • @ArthurCarvalhoBrito Based on the behavior you describe, you have a very large dataset and it's just taking a long time to run. margins is, unfortunately, quite slow (e.g., compared to its Stata analogue). Wait it out. – Thomas Oct 19 '17 at 07:41
  • Wait the whole night? I tried it. And the same would happen using only 10 observations of the dataset – Arthur Carvalho Brito Oct 19 '17 at 10:56
  • By this post and conversation I get the impression that you are not getting any warning or error messages? Are you using RStudio? Also make sure you have no `NA`s in your data. – Samuel Oct 19 '17 at 11:01
  • Try anything, reinstalling the package, upgrade or reinstall base R, different computer, different OS, etc. Troubleshoot as much as you can. – Samuel Oct 19 '17 at 11:03
  • @jsb I will try reinstalling later. No error messages... RStudio just keeps on trying forever. Using RStudio, and the dataset has no `NA` values – Arthur Carvalho Brito Oct 19 '17 at 12:52
  • @Thomas I have read (https://cran.r-project.org/web/packages/margins/vignettes/Introduction.html) that `margins` can't handle well the use of the `at` parameter in regressions with factors. Is that true? – Arthur Carvalho Brito Oct 20 '17 at 02:01
  • 1
    If it is written on CRAN it is very likely true. – Samuel Oct 20 '17 at 02:35
0

I was having the same problem and did two things to fix it. first I Updated R to the newest version, then I created a new data frame that had every combination of variables I was interested in, instead of my original data frame that had over 300000 observations,' for example:

newdata<- with(d1,data.frame(dummy.eleito= rep(seq(from =0,to = 1, by = 1)
    ,length(levels(Escolaridade)))
,Escolaridade= as.character(sapply(levels(Escolaridade),rep,2))))

Then I used margins on the new data set, so it gave me the marginal effect for all of the combinations i was interested in, and did not take so long.

Isaac Fratti
  • 475
  • 1
  • 4
  • 8