I have wanted to see contrasts inside a specified model:
is_service ~ action_count * document_entropy
The full dataset is loaded in the code.
Overall the data are these:
> str(dat)
'data.frame': 6432 obs. of 3 variables:
$ action_count : num 0.0759 0.1505 0.1435 0.1535 0.2067 ...
$ document_entropy: num -0.667 -0.667 -0.667 -0.667 -0.667 ...
$ is_service : int 0 0 0 0 0 0 0 0 0 0 ...
The target column has this binomial distribution:
> table(dat$is_service)
0 1
6291 141
Input columns are z-normalized and distributed as follows:
It is interesting to see that when I fit this model (1st part of the code) the procedure ends without a warnings.
However, when I run contrasts with the stats::anova
(2nd part of code) it does return warnings.
Question: Why is that happening, and which level is more alarming: single model or the anova analysis of it?
list.of.packages <- c('RCurl')
new.packages <- list.of.packages[!(list.of.packages %in% installed.packages()[,"Package"])]
if(length(new.packages)) install.packages(new.packages)
library(RCurl)
x <- getURL("https://rawgit.com/alexmosc/FX_Big_Experiment/master/service_train_saved.csv")
dat <- read.csv(text = x)
dat$X <- NULL
str(dat)
# first part
summary(
glm(formula = is_service ~ action_count * document_entropy
, family = binomial(link = 'logit'),
data = dat
)
)
# second part
anova(
glm(formula = is_service ~ 1
, family = binomial(link = 'logit')
, data = dat
)
, glm(formula = is_service ~ action_count
, family = binomial(link = 'logit')
, data = dat
)
, glm(formula = is_service ~ action_count + document_entropy
, family = binomial(link = 'logit')
, data = dat
)
, glm(formula = is_service ~ action_count + document_entropy + action_count:document_entropy
, family = binomial(link = 'logit')
, data = dat
)
, test = "Chisq"
)