Why is the predict() function recycling the same four values for my GLM?

Question

I made a fairly simple binomial regression model:

m_r <- mle2(ig$v ~ dbinom(size=ig$n, prob = 1/(1+exp(-(a + br * ig$river_dist)))),
    start = list(a = 0, br = 0), data = ig)

based on this dataframe:

> ig
    v   n ig_dist river_dist tam_dist       site
1 102 256     950       1040     1040     Boveda
2   1  11    4800        720      832 Cuchaconga
3  19  24    2000        475      475   Ishpingo
4  12  15    3400        611      800    La Joya

And now I'd like to graph predicted outcomes for a range of possible 'river_dist' values. To do that, I created a new dataframe:

newdat <- data.frame(river_dist=seq(min(ig$river_dist), max(ig$river_dist),len=100))

and tried to add predicted values based on the model:

newdat$v <- predict(m_r, newdata=newdat, type="response")

But it seems to be recycling the same four values over and over (short sample, but the numbers keep repeating):

> head(newdat)
  river_dist         v
1   475.0000 95.110424
2   480.7071  7.450936
3   486.4141 20.330167
4   492.1212 11.456229
5   497.8283 95.110424
6   503.5354  7.450936

What am I doing wrong?

Edit: By changing 'ig$river_dist' to 'river_dist' in my model, I'm able to produce what look like real predictions, but they're still following a four-value cycle (with slight variations each time), producing zigzags in my graph, rather than a slope or curve I was expecting. If someone could explain why, I'd appreciate it! My plotting:

plot(v~river_dist, data=ig, col="red4")
lines(v ~ river_dist, newdat, col="green4", lwd=2)

`predict(m_r, newdata=newdat, type="response")` only gives four values and these are thus recycled. But I don't know why there are only four values returned although you `newdat` data frame has 100 rows since I don't know this package. — Ahorn, May 23 '20 at 18:13

score 0 · Answer 1 · answered May 23 '20 at 18:21

Ok the problem was in the formula specification. You referred to the river_dist variable as ig$river_dist although you specified the data argument already. So the variable name used in the fitting process did not match the variable name of the data used for the predict function.

library(bbmle)

ig <- tibble::tribble(
    ~v,   ~n, ~ig_dist, ~river_dist, ~tam_dist,        ~site,
  102L, 256L,     950L,       1040L,     1040L,     "Boveda",
    1L,  11L,    4800L,        720L,      832L, "Cuchaconga",
   19L,  24L,    2000L,        475L,      475L,   "Ishpingo",
   12L,  15L,    3400L,        611L,      800L,    "La Joya"
  )

m_r <- mle2(ig$v ~ dbinom(size=ig$n, prob = 1/(1+exp(-(a + br * river_dist)))),
            start = list(a = 0, br = 0), data = ig)

newdat <- data.frame(river_dist=seq(min(ig$river_dist), max(ig$river_dist),len=100))

newdat$v <- predict(m_r, newdata=newdat, type="response")

head(newdat$v)

#> [1] 216.855114   9.285536  20.187424  12.571487 213.762248   9.150584

Thank you! This solved at least part of the problem. The numbers do still have a wacky cycle to them (c. 220, c. 9-10, c. 20, c. 12) that creates crazy zig-zags in my plot, rather than the gentle curve I was expecting. (I'll edit the post to show my plotting commands). That may be an issue with my modeling strategy, rather than the code itself? Any ideas? — Lauren Pratt, May 23 '20 at 21:16
Yes I see. No sorry no ideas, not familiar with this kind of modeling. Hope I could help anyway. Maybe try Cross Validated to get help with the modeling part. — Ahorn, May 23 '20 at 21:56

Why is the predict() function recycling the same four values for my GLM?

1 Answers1