I'm working out the output of a model generated with glm
. The model output is stored in a nested tibble. I want to calculate confidence interval through the transformation from type
= "link" to inverse-link (using $family$linkinv
). However, I can't get it to work with dplyr::mutate
in a nested tibble because the way of pulling the $family$linkinv
is from the model object using model$family$linkinv(x)
, which seems not to work as intended in the nested format.
Background
This current question is based on a previous question (and chosen answer) I posted about testing the level of liking fruits by different predictors using a linear model. I conduct a research to figure out which fruit is more likeable: mango, banana, or apple. To this end, I go ahead and sample 100 people at random. I ask them to rate, on a scale of 1-5, the degree of liking each of the fruits.
While the previous question had to do with lm
, here I'm trying to utilize quasibinomial glm
. The issue is that I want to get confidence intervals but my method (glm %>% predict
) outputs SE in "link space", therefore I have to go through a conversion process (detailed in this SO answer) to get what I want.
Data
library(tidyverse)
library(magrittr)
set.seed(123)
fruit_liking_df <-
data.frame(
id = 1:100,
i_love_apple = sample(c(1:5), 100, replace = TRUE),
i_love_banana = sample(c(1:5), 100, replace = TRUE),
i_love_mango = sample(c(1:5), 100, replace = TRUE),
age = sample(c(20:70), 100, replace = TRUE),
is_male = sample(c(0, 1), 100, prob = c(0.2, 0.8), replace = TRUE),
education_level = sample(c(1:4), 100, replace = TRUE),
is_colorblinded = sample(c(0, 1), 100, replace = TRUE)
)
> as_tibble(fruit_liking_df)
## # A tibble: 100 x 8
## id i_love_apple i_love_banana i_love_mango age is_male education_level is_colorblinded
## <int> <int> <int> <int> <int> <dbl> <int> <dbl>
## 1 1 3 5 2 50 1 2 0
## 2 2 3 3 1 49 1 1 0
## 3 3 2 1 5 70 1 1 1
## 4 4 2 2 5 41 1 3 1
## 5 5 3 1 1 49 1 4 0
## 6 6 5 2 1 29 0 1 0
## 7 7 4 5 5 35 1 3 0
## 8 8 1 3 5 24 0 3 0
## 9 9 2 4 2 55 1 2 0
## 10 10 3 4 2 69 1 4 0
## # ... with 90 more rows
I want to test my data in a per cent scale, so first I transform it by subtracting 1 and then dividing by 4:
fruit_liking_df %<>%
mutate_at(vars(starts_with("i_love_")), ~ subtract(., 1) %>% divide_by(., 4))
> as_tibble(fruit_liking_df)
## # A tibble: 100 x 8
## id i_love_apple i_love_banana i_love_mango age is_male education_level is_colorblinded
## <int> <dbl> <dbl> <dbl> <int> <dbl> <int> <dbl>
## 1 1 0.5 1 0.25 50 1 2 0
## 2 2 0.5 0.5 0 49 1 1 0
## 3 3 0.25 0 1 70 1 1 1
## 4 4 0.25 0.25 1 41 1 3 1
## 5 5 0.5 0 0 49 1 4 0
## 6 6 1 0.25 0 29 0 1 0
## 7 7 0.75 1 1 35 1 3 0
## 8 8 0 0.5 1 24 0 3 0
## 9 9 0.25 0.75 0.25 55 1 2 0
## 10 10 0.5 0.75 0.25 69 1 4 0
## # ... with 90 more rows
Now I use a pipe to run the glm model for each fruit, get SE in link space, and convert SE to CI
## will be needed later
my_new_data_for_pred <- expand_grid(
age = 45,
is_male = .5,
education_level = 2.5,
is_colorblinded = 0.5
)
## will be needed later
critval <- 1.96
model_fits_grouped <-
fruit_liking_df %>%
pivot_longer(starts_with("i_love"), values_to = "fruit") %>%
group_by(name) %>%
tidyr::nest() %>%
mutate(model_fit = map(
data,
~ glm(
data = .x,
fruit ~ I(age - 45) +
I((age - 45) ^ 2) +
I(is_male - .5) +
I(education_level - 2) +
is_colorblinded,
family = quasibinomial
)
)) %>%
mutate(predicted_values = map(
model_fit,
~ bind_cols(my_new_data_for_pred,
as.data.frame(
predict(
newdata = my_new_data_for_pred,
.x,
type = "link",
interval = "confidence",
level = 0.95,
se.fit = T
)
)) %>%
rowwise() %>%
mutate(
estimate = fit,
lower_ci_link = fit - critval * se.fit,
upper_ci_link = fit + critval * se.fit
)
))
> model_fits_grouped
## # A tibble: 3 x 4
## # Groups: name [3]
## name data model_fit predicted_values
## <chr> <list> <list> <list>
## 1 i_love_apple <tibble [100 x 6]> <glm> <tibble [1 x 10]>
## 2 i_love_banana <tibble [100 x 6]> <glm> <tibble [1 x 10]>
## 3 i_love_mango <tibble [100 x 6]> <glm> <tibble [1 x 10]>
Unnesting the predicted_values
gets:
> model_fits_grouped %>% unnest(predicted_values)
## # A tibble: 3 x 13
## # Groups: name [3]
## name data model_fit age is_male education_level is_colorblinded fit se.fit residual.scale estimate lower_ci_link upper_ci_link
## <chr> <list> <list> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 i_love_apple <tibble [100 x 6~ <glm> 45 0.5 2.5 0.5 0.0843 0.261 0.709 0.0843 -0.427 0.595
## 2 i_love_banana <tibble [100 x 6~ <glm> 45 0.5 2.5 0.5 -0.0718 0.286 0.781 -0.0718 -0.633 0.489
## 3 i_love_mango <tibble [100 x 6~ <glm> 45 0.5 2.5 0.5 -0.140 0.279 0.762 -0.140 -0.687 0.407
Here's the problem: Now I want to mutate two more columns within predicted_values
for the inverse-link transformation for lower_ci_link
and upper_ci_link
, but this fails
model_fits_grouped <-
fruit_liking_df %>%
pivot_longer(starts_with("i_love"), values_to = "fruit") %>%
group_by(name) %>%
tidyr::nest() %>%
mutate(model_fit = map(
data,
~ glm(
data = .x,
fruit ~ I(age - 45) +
I((age - 45) ^ 2) +
I(is_male - .5) +
I(education_level - 2) +
is_colorblinded,
family = quasibinomial
)
)) %>%
mutate(predicted_values = map(
model_fit,
~ bind_cols(my_new_data_for_pred,
as.data.frame(
predict(
newdata = my_new_data_for_pred,
.x,
type = "link",
interval = "confidence",
level = 0.95,
se.fit = T
)
)) %>%
rowwise() %>%
mutate(
estimate = fit,
lower_ci_link = fit - critval * se.fit,
upper_ci_link = fit + critval * se.fit
) %>%
######################### this addition fails ###########################
mutate(
lower_ci_inverse_link = model_fit$family$linkinv(lower_ci_link),
upper_ci_inverse_link = model_fit$family$linkinv(upper_ci_link)
)
#########################################################################
))
And I get:
Error: Problem with
mutate()
inputpredicted_values
. x Problem withmutate()
inputlower_ci_inverse_link
. x attempt to apply non-function i Inputlower_ci_inverse_link
ismodel_fit$family$linkinv(lower_ci_link)
. i The error occurred in row
- i Input
predicted_values
ismap(...)
. i The error occurred in row 1.
I assume that the problem is that I'm trying to mutate new columns within predicted_values
, but using model_fit$family$linkinv(lower_ci_link)
refers to model_fit
which is in a higher level in the nested tibble.
Bottom line question
How can I get to mutate inverse-link columns within predicted_values
using model_fit$family$linkinv(lower_ci_link)
and model_fit$family$linkinv(upper_ci_link)
to ultimately get (scroll all the way to the rightest two columns):
> model_fits_grouped %>% unnest(predicted_values)
## # A tibble: 3 x 15
## # Groups: name [3]
## name data model_fit age is_male education_level is_colorblinded fit se.fit residual.scale estimate lower_ci_link upper_ci_link lower_ci_inverse_link_*DEMO* upper_ci_inverse_link_*DEMO*
## <chr> <list> <list> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 i_love_apple <tibble [100 x 6]> <glm> 45 0.5 2.5 0.5 0.521 0.0632 0.349 0.521 0.397 0.645 0.111 0.111
## 2 i_love_banana <tibble [100 x 6]> <glm> 45 0.5 2.5 0.5 0.482 0.0701 0.387 0.482 0.345 0.620 0.222 0.222
## 3 i_love_mango <tibble [100 x 6]> <glm> 45 0.5 2.5 0.5 0.465 0.0683 0.377 0.465 0.331 0.599 0.333 0.333
APPENDIX
DEMONSTRATION OF HOW I'M ABLE TO GET WHAT I WANT WITHOUT A PIPE OR DATAFRAME
The following method relies on assigning variables for several steps along the way. For the sake of demonstration, it shows how to run the model and get the $family$linkinv
for just one fruit.
Data
As before, it's fruit_liking_df
after doing the arithmetic transformation to decimals, hence:
> as_tibble(fruit_liking_df)
## # A tibble: 100 x 8
## id i_love_apple i_love_banana i_love_mango age is_male education_level is_colorblinded
## <int> <dbl> <dbl> <dbl> <int> <dbl> <int> <dbl>
## 1 1 0.5 1 0.25 50 1 2 0
## 2 2 0.5 0.5 0 49 1 1 0
## 3 3 0.25 0 1 70 1 1 1
## 4 4 0.25 0.25 1 41 1 3 1
## 5 5 0.5 0 0 49 1 4 0
## 6 6 1 0.25 0 29 0 1 0
## 7 7 0.75 1 1 35 1 3 0
## 8 8 0 0.5 1 24 0 3 0
## 9 9 0.25 0.75 0.25 55 1 2 0
## 10 10 0.5 0.75 0.25 69 1 4 0
## # ... with 90 more rows
Model
I'm going to focus only on the i_love_apple
column data, and run glm
on it.
my_model <-
glm(
i_love_apple ~
I(age - 45) +
I((age - 45) ^ 2) +
I(is_male - 0.5) +
I(education_level - 2) +
I(is_colorblinded - 0.5),
family = quasibinomial,
data = fruit_liking_df
)
Prediction
Now I run predict()
on my_model
using prediction data from my_new_data_for_pred
:
prediction_link_type <-
predict(object = my_model,
newdata = my_new_data_for_pred,
type = "link", ## <------------ type = "link" is crucial to note
interval = "confidence",
level = 0.95,
se.fit = TRUE)
> prediction_link_type
## $fit
## 1
## 0.08427577
## $se.fit
## [1] 0.2606326
## $residual.scale
## [1] 0.7090294
Now I convert from the SE measure I got in prediction_link_type
to confidence interval (CI) by multiplying the SE by critval
(which has been assigned with 1.96
). I assign two separate vectors: one with the upper bound CI, and another with the lower bound CI:
lower_ci_link <- prediction_link_type$fit - (critval * prediction_link_type$se.fit)
upper_ci_link <- prediction_link_type$fit + (critval * prediction_link_type$se.fit)
Almost there! I got the CI values but they are in "link" space (because predict()
used type = "link"
). To convert the CI values back from "link", I use inverse-link function:
lower_ci_inverse_link <- my_model$family$linkinv(lower_ci_link)
upper_ci_inverse_link <- my_model$family$linkinv(upper_ci_link)
In Summary
Although this "vectors" method gets the job done, it is not what I'm looking for. Instead, I want to incorporate the conversion of "link -> SE -> CI -> inverselink" through the pipe introduced in the beginning of this question.