Is there a way to plot hurdle model results pscl package or a way to plot the count - zero truncated negbin part of hurdle model in r?

Question

Is there a way to plot hurdle model results in R? I was able to plot the zero part (binomial with logit link) of the hurdle model (below) but I can't figure out how to plot the count part of the model (truncated negative binomial with log link). I am using the pscl package for the hurdle model.

Example data (df name = data):

structure(list(Byths = c(333L, 107L, 0L, 0L, 684L, 0L, 113L, 
0L, 0L, 20L, 251L, 20L, 0L, 0L, 32L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 182L, 30L, 0L, 0L, 183L, 0L, 0L, 0L, 8L, 213L, 108L, 
310L, 960L, 720L, 0L, 0L, 6L, 72L, 78L, 15L, 196L, 256L, 608L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 105L, 0L, 194L, 0L, 440L, 0L, 0L, 
0L, 0L, 0L, 18L, 0L, 239L, 262L, 0L, 102L, 17L, 0L, 0L, 0L, 0L, 
0L, 68L, 93L, 0L, 226L, 118L, 91L, 330L, 104L, 68L, 224L, 0L, 
0L, 18L, 79L, 71L, 8L, 73L, 38L, 39L, 7L), Season = structure(c(3L, 
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), levels = c("Pre", 
"Peak", "Post"), class = "factor"), depthm = c(20.1170446232626, 
9.1441111923921, 18.2882223847842, 18.2882223847842, 17.373811265545, 
13.7161667885881, 13.7161667885881, 13.106559375762, 13.106559375762, 
8.53450377956596, 8.53450377956596, 12.4969519629359, 20.1170446232626, 
20.1170446232626, 8.83930748597903, 19.2026335040234, 19.2026335040234, 
20.1170446232626, 20.1170446232626, 20.1170446232626, 20.1170446232626, 
20.1170446232626, 14.0209704950012, 14.0209704950012, 14.0209704950012, 
14.0209704950012, 14.0209704950012, 8.53450377956596, 12.8017556693489, 
20.7266520360888, 20.7266520360888, 17.373811265545, 14.0209704950012, 
14.0209704950012, 14.0209704950012, 8.22970007315289, 8.22970007315289, 
18.5930260911973, 17.373811265545, 8.53450377956596, 7.92489636673982, 
7.92489636673982, 7.92489636673982, 7.92489636673982, 7.92489636673982, 
7.92489636673982, 9.44891489880517, 9.44891489880517, 9.44891489880517, 
9.44891489880517, 15.8497927334796, 12.1921482565228, 8.83930748597903, 
8.83930748597903, 18.8978297976103, 8.83930748597903, 8.83930748597903, 
17.0690075591319, 20.4218483296757, 20.4218483296757, 20.1170446232626, 
20.1170446232626, 13.106559375762, 13.106559375762, 13.106559375762, 
13.106559375762, 13.106559375762, 13.106559375762, 8.83930748597903, 
10.6681297244574, 9.75371860521824, 9.75371860521824, 9.75371860521824, 
11.5825408436967, 9.1441111923921, 17.0690075591319, 17.0690075591319, 
17.0690075591319, 17.0690075591319, 17.0690075591319, 17.0690075591319, 
20.1170446232626, 10.0585223116313, 17.373811265545, 20.1170446232626, 
20.4218483296757, 20.4218483296757, 7.92489636673982, 7.92489636673982, 
7.92489636673982, 7.92489636673982, 7.92489636673982, 7.92489636673982, 
7.92489636673982, 7.92489636673982)), row.names = c(5L, 7L, 12L, 
13L, 14L, 16L, 18L, 21L, 22L, 23L, 24L, 26L, 28L, 29L, 32L, 33L, 
34L, 35L, 36L, 37L, 38L, 39L, 46L, 48L, 49L, 50L, 51L, 52L, 53L, 
54L, 55L, 56L, 57L, 59L, 61L, 62L, 63L, 65L, 66L, 71L, 73L, 75L, 
77L, 78L, 79L, 80L, 81L, 82L, 83L, 84L, 85L, 86L, 92L, 95L, 98L, 
107L, 108L, 109L, 110L, 111L, 112L, 113L, 114L, 115L, 116L, 117L, 
118L, 119L, 125L, 137L, 139L, 142L, 143L, 144L, 147L, 150L, 151L, 
152L, 154L, 156L, 157L, 159L, 160L, 161L, 162L, 163L, 165L, 177L, 
178L, 179L, 180L, 181L, 183L, 184L, 185L), class = "data.frame")

Model:

mod <- hurdle(Byths ~ depthm * Season, data = data, dist = "negbin")

summary(mod)

Gives:

Call:
hurdle(formula = Byths ~ depthm * Season, data = data, dist = "negbin")

Pearson residuals:
    Min      1Q  Median      3Q     Max 
-0.8326 -0.5676 -0.2256  0.1534  4.7222 

Count model coefficients (truncated negbin with log link):
                  Estimate Std. Error z value Pr(>|z|)    
(Intercept)        5.88538    0.63004   9.341  < 2e-16 ***
depthm            -0.05014    0.05577  -0.899 0.368658    
SeasonPeak        -3.03180    0.86315  -3.512 0.000444 ***
SeasonPost        -2.74270    1.95696  -1.402 0.161061    
depthm:SeasonPeak  0.17058    0.07106   2.400 0.016380 *  
depthm:SeasonPost  0.21029    0.13113   1.604 0.108783    
Log(theta)         0.13512    0.19772   0.683 0.494367    
Zero hurdle model coefficients (binomial with logit link):
                  Estimate Std. Error z value Pr(>|z|)    
(Intercept)        4.02258    1.10922   3.627 0.000287 ***
depthm            -0.32024    0.08317  -3.851 0.000118 ***
SeasonPeak        -2.55477    1.65557  -1.543 0.122798    
SeasonPost        -3.94742    3.35163  -1.178 0.238892    
depthm:SeasonPeak  0.27367    0.12088   2.264 0.023569 *  
depthm:SeasonPost  0.30061    0.21990   1.367 0.171617    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

Theta: count = 1.1447
Number of iterations in BFGS optimization: 15 
Log-likelihood: -344.6 on 13 Df

I exponentiated the coefficients to interpret them, however I would like to make some figures for my thesis.

The only way I figured out how to plot the zero part of the hurdle model was by doing the following:

#turned counts into presence/absence

structure(list(Byths = c(1, 1, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 
0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1, 1, 
1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 
0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 1, 1, 0, 0, 0, 0, 0, 
1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1), 
    Season = structure(c(3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 
    2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
    2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), levels = c("Pre", 
    "Peak", "Post"), class = "factor"), depthm = c(20.1170446232626, 
    9.1441111923921, 18.2882223847842, 18.2882223847842, 17.373811265545, 
    13.7161667885881, 13.7161667885881, 13.106559375762, 13.106559375762, 
    8.53450377956596, 8.53450377956596, 12.4969519629359, 20.1170446232626, 
    20.1170446232626, 8.83930748597903, 19.2026335040234, 19.2026335040234, 
    20.1170446232626, 20.1170446232626, 20.1170446232626, 20.1170446232626, 
    20.1170446232626, 14.0209704950012, 14.0209704950012, 14.0209704950012, 
    14.0209704950012, 14.0209704950012, 8.53450377956596, 12.8017556693489, 
    20.7266520360888, 20.7266520360888, 17.373811265545, 14.0209704950012, 
    14.0209704950012, 14.0209704950012, 8.22970007315289, 8.22970007315289, 
    18.5930260911973, 17.373811265545, 8.53450377956596, 7.92489636673982, 
    7.92489636673982, 7.92489636673982, 7.92489636673982, 7.92489636673982, 
    7.92489636673982, 9.44891489880517, 9.44891489880517, 9.44891489880517, 
    9.44891489880517, 15.8497927334796, 12.1921482565228, 8.83930748597903, 
    8.83930748597903, 18.8978297976103, 8.83930748597903, 8.83930748597903, 
    17.0690075591319, 20.4218483296757, 20.4218483296757, 20.1170446232626, 
    20.1170446232626, 13.106559375762, 13.106559375762, 13.106559375762, 
    13.106559375762, 13.106559375762, 13.106559375762, 8.83930748597903, 
    10.6681297244574, 9.75371860521824, 9.75371860521824, 9.75371860521824, 
    11.5825408436967, 9.1441111923921, 17.0690075591319, 17.0690075591319, 
    17.0690075591319, 17.0690075591319, 17.0690075591319, 17.0690075591319, 
    20.1170446232626, 10.0585223116313, 17.373811265545, 20.1170446232626, 
    20.4218483296757, 20.4218483296757, 7.92489636673982, 7.92489636673982, 
    7.92489636673982, 7.92489636673982, 7.92489636673982, 7.92489636673982, 
    7.92489636673982, 7.92489636673982)), row.names = c(5L, 7L, 
12L, 13L, 14L, 16L, 18L, 21L, 22L, 23L, 24L, 26L, 28L, 29L, 32L, 
33L, 34L, 35L, 36L, 37L, 38L, 39L, 46L, 48L, 49L, 50L, 51L, 52L, 
53L, 54L, 55L, 56L, 57L, 59L, 61L, 62L, 63L, 65L, 66L, 71L, 73L, 
75L, 77L, 78L, 79L, 80L, 81L, 82L, 83L, 84L, 85L, 86L, 92L, 95L, 
98L, 107L, 108L, 109L, 110L, 111L, 112L, 113L, 114L, 115L, 116L, 
117L, 118L, 119L, 125L, 137L, 139L, 142L, 143L, 144L, 147L, 150L, 
151L, 152L, 154L, 156L, 157L, 159L, 160L, 161L, 162L, 163L, 165L, 
177L, 178L, 179L, 180L, 181L, 183L, 184L, 185L), class = "data.frame")

Then plotted hurdle part of model...

ggplot(dataPA, aes(x = depthm, y = Byths)) + 
  geom_point() +
  stat_smooth(method = "glm", method.args = list(family="binomial"), se = TRUE) + facet_wrap(~ Season)

Is there a way to plot the count model part of the hurdle model (truncated negbin)? Or a better way to plot hurdle model results in general?

Should I be using predict.hurdle for this and use predicted values? If so, how do I do this, the examples I've found haven't been clear enough for me to understand and what I have been trying for a few weeks has not worked.

Thank you!

Does this [Hurdle model results interpretation and plotting](https://stats.stackexchange.com/questions/215962/hurdle-model-results-interpretation-and-plotting) helps? — Quinten, Nov 26 '22 at 09:15
@Quinten I had looked at that post and it was very helpful - I made those plots with my data for personally seeing what was going on with the model but was looking for a nicer way of plotting model results for my thesis. Thank you! — FishyFishies, Nov 28 '22 at 02:08

score 2 · Accepted Answer · answered Nov 26 '22 at 09:57

Plotting count part of model

You can easily do the same thing you've done for the hurdle part, but now with the count part of the model. You just need to use the negative binomial model, but only the data where Byths > 0. The other complication, is that glm() doesn't have capability for negative binomial model, so you'll need to use MASS::glm.nb()

Here you go:

ggplot(data[data$Byths != 0, ], aes(x = depthm, y = Byths)) + 
  geom_point() +
  stat_smooth(method = MASS::glm.nb, se = TRUE) + facet_wrap(~ Season)

Now I wanted to make sure that this would get the same result as if we used the predict functionality from the function authors. Good news is that we do, but that above we easily get the confidence interval for our predictions.

newdat <- expand.grid(
  depthm = seq(8, 20, .25),
  Season = c("Pre", "Peak", "Post")
)
newdat$count_mean <- predict(mod, newdat, type = "count")
ggplot(newdat, aes(x = depthm, y = count_mean)) + 
  geom_point() + 
  facet_wrap(~ Season) +
  ylim(c(0, 1250))

Plotting overall model

If this was my research, I'd likely want to have a look at my model results overall. So here I plot both the expected count for each value -- combing the count and hurdle part of the model, our overall prediction. I'd also want to get a look at the whole predicted distributions, not just the means. So here, I plot the whole distribution (at selected values) so that the model predictions are more clear. To do this, I used the {ggdist} package (I tried to enter the model predictions straight there, but couldn't figure out how to do it, so rather I sampled from the predicted distribution 10,000 times).

Here you go:

library(ggdist)

mean_dat <- expand_grid(
  depthm = seq(8, 20, 0.25),
  Season = c("Pre", "Peak", "Post")
) %>% 
  mutate(
    mean_count = predict(mod, ., type = "response")
  ) 
  
simdat <- expand_grid(
  depthm = c(8, 12, 16, 20),
  Season = c("Pre", "Peak", "Post")
  ) %>% 
  mutate(
    counts = predict(mod, ., type = "prob", at = 0:1000)
  ) %>% 
  nest(counts) %>% 
  mutate(
    sim_count = map(data, ~ sample(0:1000, size = 10000, replace = TRUE, prob = t(.x))),
    .keep = "unused"
  ) %>% 
  unnest()


ggplot(mean_dat, aes(x = depthm, y = mean_count)) + 
  geom_line() + 
  geom_line() + 
  stat_slab(aes(y = sim_count), fill = "skyblue", alpha = .5, data = simdat) +
  facet_wrap(~ Season)

The black line shows the expected count for the predictor values, and the distributions in blue show (samples from) the predicted distribution at some select values.

Now in my opinion, this visualisation does a better job at showing what the model really predicts. Most values are still clumped near 0, but there is large variance, particularly as depthm gets larger.

Wow Thank you so much, this is extremely helpful! I had tried stat_smooth(method =glm.nb) but I think it wasnt working because I didn't include "MASS::". Thank you for showing me how to plot the overall model, this is actually what I want but couldnt figure out if it was possible to plot the overall model without splitting it into the two parts, very helpful! — FishyFishies, Nov 28 '22 at 02:06

Is there a way to plot hurdle model results pscl package or a way to plot the count - zero truncated negbin part of hurdle model in r?

1 Answers1

Plotting count part of model

Plotting overall model