I am trying to predict the frequency of an outcome and I have a lot of data. I have already fitted a glm to the data and now I am trying to use ctree to understand any complex interaction in the dataset that I may have missed.
Instead of directly predicting the residual, I have tried to offset the ctree model to the glm prediction. However, I seem to get the same results when I: (a) use no offset at all, (b) specify the offset in the function, and (c) use the offset in the ctree equation.
I have tried looking at the documentation(here and here) but I have not found it helpful.
I have created some dummy data to mimic what I am doing:
library(partykit)
# Set random number seed
set.seed(15)
# Create Dataset
freq <- rpois(10000, 1.2)
example_df <- data.frame(var_1 = rnorm(10000, 180, 20) * freq / 10,
var_2 = runif(10000, 1, 8),
var_3 = runif(10000, 1, 2.5) + freq / 1000)
example_df$var_4 = example_df$var_1 * example_df$var_3 + rnorm(10000, 0.1, 0.5)
example_df$var_5 = example_df$var_2 * example_df$var_3 + rnorm(10000, 2, 50)
# Create GLM
base_mod <- glm(freq ~ ., family="poisson", data=example_df)
base_pred <- predict(base_mod)
# Create trees
exc_offset <- ctree(freq ~ ., data = example_df, control = ctree_control(alpha = 0.01, minbucket = 1000))
func_offset <- ctree(freq ~ ., data = example_df, offset = base_pred, control = ctree_control(alpha = 0.01, minbucket = 1000))
equ_offset <- ctree(freq ~ . + offset(base_pred), data = example_df, control = ctree_control(alpha = 0.01, minbucket = 1000))
I expected the outcomes of the trees to be different when the offset is included from when the offset isn't included. However, the outputs seem to be the same:
# Predict outcomes
summary(predict(exc_offset, example_df))
summary(predict(func_offset, example_df))
summary(predict(equ_offset, example_df))
# Show trees
exc_offset
func_offset
equ_offset
Does anyone know what is going on? Have should I use the offsets?