3

I am trying to model the relationship between conspecific and heterospecific pollen counts on individual flowers under a poisson or negative binomial distribution using the glm()/glm.nb() functions in R (3.6.2):

install.packages("MASS")
library("MASS")

cphp.p<-glm(CP ~ HP, family = poisson, data = pollendata) 
cphp.nb<-glm.nb(CP ~ HP, data = pollendata)

I've been using the rootogram() function in countreg to assess goodness of fit for the models, but the values displayed on the plot's x-axis don't seem correct to me:

install.packages("countreg", repos="http://R-Forge.R-project.org")
library("countreg")

par(mfrow = c(1,2))
rootogram(cphp.p, style = "hanging", scale = "sqrt") 
rootogram(cphp.nb, style = "hanging", scale = "sqrt")

rootograms for cphp.p and cphp.nb

If I understand correctly, rootograms should be displaying the range of observed values of the response variable along the x-axis, with the bars of observed frequencies along the y-axis - similar to a histogram of count frequencies. My response variable ranges from 0-467, but the rootogram only displays values from 0-120 along the x-axis and it shows a really poor fit to the specified distribution (none of the bars root on the zero-line). Plotting the frequencies of my observed response values using hist() gives me a plot that corresponds to a poisson-type distribution:

histogram of CP values

par(mfrow = c(1,1))
hist(pollendata$CP) 

Can anyone make sense of the rootogram outputs and perhaps explain why the model fit is so poor? I am a student, and quite new to statistical modelling, so I would appreciate layperson terminology where possible. Many thanks in advance.

My data:

 > dput(pollendata)

structure(list(CP = c(182, 112, 255, 5, 25, 48, 66, 72, 68, 4, 
206, 89, 115, 196, 170, 0, 92, 138, 21, 175, 61, 233, 1, 148, 
115, 129, 11, 103, 135, 63, 98, 138, 0, 29, 30, 84, 53, 162, 
204, 30, 327, 13, 62, 168, 72, 142, 46, 152, 150, 75, 233, 38, 
123, 169, 116, 8, 103, 97, 243, 102, 153, 16, 170, 87, 297, 303, 
60, 0, 194, 110, 13, 285, 5, 37, 91, 202, 325, 111, 156, 163, 
220, 215, 150, 37, 128, 247, 23, 121, 72, 169, 62, 167, 84, 63, 
46, 56, 216, 41, 13, 247, 12, 237, 88, 64, 207, 139, 65, 75, 
66, 36, 150, 378, 131, 3, 76, 467, 346, 72, 86, 39, 171, 12, 
66, 107, 78), HP = c(61, 167, 0, 0, 0, 230, 0, 0, 111, 1, 27, 
15, 40, 54, 111, 1, 94, 25, 1, 3, 76, 27, 1, 261, 166, 227, 0, 
16, 148, 185, 28, 97, 0, 4, 0, 138, 191, 24, 105, 87, 56, 0, 
32, 30, 92, 16, 7, 118, 0, 57, 98, 26, 64, 48, 146, 0, 91, 148, 
87, 10, 69, 0, 66, 73, 59, 128, 80, 0, 199, 167, 55, 156, 425, 
385, 77, 220, 264, 39, 132, 75, 234, 94, 107, 195, 124, 120, 
58, 65, 83, 128, 46, 117, 162, 69, 108, 83, 64, 127, 185, 38, 
160, 86, 55, 73, 0, 88, 86, 117, 69, 117, 102, 88, 69, 78, 72, 
105, 265, 105, 104, 140, 115, 37, 22, 82, 105)), class = "data.frame", row.names = c(NA, 
-125L))
benson23
  • 16,369
  • 9
  • 19
  • 38
  • Hi, Welcome to SO! It would make it easier for others to help you if you provided your sample data in a form that is more reproducible. This can be done with ```dput()```. For example, suppose that your data object is named 'data'. To edit (grey 'edit' link) your question, replace the data you have shown with the copy-pasted console output generated by executing the code ```dput(data)```. – xilliam Aug 26 '20 at 10:51
  • Thank you @xilliam. I have edited accordingly. – Arjan Engelen Aug 26 '20 at 16:02
  • not an answer but this might be useful: https://stackoverflow.com/questions/43075911/examining-residuals-and-visualizing-zero-inflated-poission-r/ – user63230 Aug 27 '20 at 07:13
  • I think the problem might be that I've assumed the wrong distribution for my model since the data are counts per flower and not counts over an interval of time or space. So maybe modelling with a Gaussian distribution rather than Poisson/nb would be more appropriate? – Arjan Engelen Aug 27 '20 at 09:01
  • 1
    @Arjan Engelen: I think that sounds correct. Consider also using a sqrt transformation, since your CP distribution is not really "normal". Also note that the Stack Exchange community is discouraged from carrying on long discussions in the comments. Instead, you could ask a new question. In your case, since the question is more about statistics than programming, try Cross Validated. – xilliam Aug 27 '20 at 11:29
  • I discovered the `countreg` and the `rootogram()` function today and I have the same issue. The rootogram does not show the full range of my data and there are even some discrepancies between different models of the same data. Have you found a solution/explanation for this? @ArjanEngelen – degeso May 06 '23 at 17:56

0 Answers0