Is there a specific reason why R GLM does not return warnings, while anova(glm) does?

Question

I have wanted to see contrasts inside a specified model:

is_service ~ action_count * document_entropy

The full dataset is loaded in the code.

Overall the data are these:

> str(dat)
'data.frame':   6432 obs. of  3 variables:
 $ action_count    : num  0.0759 0.1505 0.1435 0.1535 0.2067 ...
 $ document_entropy: num  -0.667 -0.667 -0.667 -0.667 -0.667 ...
 $ is_service      : int  0 0 0 0 0 0 0 0 0 0 ...

The target column has this binomial distribution:

> table(dat$is_service)

   0    1 
6291  141

Input columns are z-normalized and distributed as follows:

It is interesting to see that when I fit this model (1st part of the code) the procedure ends without a warnings.

However, when I run contrasts with the stats::anova (2nd part of code) it does return warnings.

Question: Why is that happening, and which level is more alarming: single model or the anova analysis of it?

list.of.packages <- c('RCurl')
new.packages <- list.of.packages[!(list.of.packages %in% installed.packages()[,"Package"])]
if(length(new.packages)) install.packages(new.packages)

library(RCurl)

x <- getURL("https://rawgit.com/alexmosc/FX_Big_Experiment/master/service_train_saved.csv")
dat <- read.csv(text = x)
dat$X <- NULL

str(dat)
# first part
summary(
     glm(formula = is_service ~ action_count * document_entropy
         , family = binomial(link = 'logit'),
         data = dat
     )
)
# second part
anova(
     glm(formula = is_service ~ 1
         , family = binomial(link = 'logit')
         , data = dat
     )
     , glm(formula = is_service ~ action_count
           , family = binomial(link = 'logit')
           , data = dat
     )
     , glm(formula = is_service ~ action_count + document_entropy
           , family = binomial(link = 'logit')
           , data = dat
     )
     , glm(formula = is_service ~ action_count + document_entropy + action_count:document_entropy
           , family = binomial(link = 'logit')
           , data = dat
     )
     , test = "Chisq"
)

It would be easier to help if you provided a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input data so we could run and test the code. — MrFlick, Oct 23 '17 at 17:24
I don't think this is really programming related. The warnings only from from the middle two models in your `anova()` list and doesn't occur when you fit the full model. This warning is discussed here: https://stats.stackexchange.com/questions/11109/how-to-deal-with-perfect-separation-in-logistic-regression. For better model suggestions this might be a better fit for [stats.se] since the problem is really statistical in nature. — MrFlick, Oct 23 '17 at 19:29
Alright, I see there can be problems with convergence as the data are unbalanced and heavy tailed. It has striken me as a surprise that I did not get warnings while fitting one model wih glm. I suppose this more R related, but I might be wrong. Is the anova warning alarming as much as the pure glm would be? Are you moderating and able to migrate my questuon to CrossVidated? Thank you for your answer. — Alexey Burnakov, Oct 23 '17 at 20:31
I started the question at CV: https://stats.stackexchange.com/questions/309595/is-there-a-specific-reason-why-r-glm-does-not-return-warnings-while-anovaglm — Alexey Burnakov, Oct 24 '17 at 09:09

Is there a specific reason why R GLM does not return warnings, while anova(glm) does?

0 Answers0