Fitting Negative Binomial Model

Question

I am trying to run the negative binomial model for the following model. Replicating the results of Table 3 in this paper Association Between Gun Law Reforms

di = ln(ni) + β00 + β10 yeari + ei, i = 1997, …, 2013

where, d = crude death rate, n = person-years at risk and years is from 1997 to 2013.

The above model is used to estimate the trend (measured as the average relative change in death rates per year) in deaths in the periods (1997-2013)

How can I fit this model in R. Any suggestions on Syntax in R. Also can someone please explain what are β00 and β10? How should I select these values?

Below are the values in my dataframe

'data.frame':   17 obs. of  13 variables:
 $ X                : int  1 2 3 4 5 6 7 8 9 10 ...
 $ year             : int  1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 .
 $ personyearsatrisk: int  18423037 18607584 18812264 19028802 19274701 19495210 19720737 19932722 20176844 20450966 ...
 $ suicidegun       : int  296 248 261 222 262 213 188 171 145 192 ...
 $ homocidegun      : int  74 56 51 60 48 50 46 18 20 36 ...
 $ massgun          : int  0 0 0 0 0 0 0 0 0 0 ...
 $ suicidenotgun    : int  2351 2393 2229 2170 2208 2115 1976 1948 1951 1952 ...
 $ homocidenotgun   : int  253 243 247 246 268 243 228 147 171 210 ...
 $ suicidetotal     : int  2647 2641 2490 2392 2470 2328 2164 2119 2096 2144 ...
 $ homocidetotal    : int  327 299 298 306 316 293 274 165 191 246 ...
 $ avgsuicidegun    : num  1.61 1.33 1.39 1.17 1.36 ...
 $ avghomocidegun   : num  0.402 0.301 0.271 0.315 0.249 ...
 $ avgsum           : num  2.01 1.63 1.66 1.48 1.61 ...

Below are my values dput(dataframe)

structure(list(X = 1:17, year = 1997:2013, personyearsatrisk = c(18423037L, 
18607584L, 18812264L, 19028802L, 19274701L, 19495210L, 19720737L, 
19932722L, 20176844L, 20450966L, 20827622L, 21249199L, 21691653L, 
22031750L, 22340024L, 22728254L, 23117353L), suicidegun = c(296L, 
248L, 261L, 222L, 262L, 213L, 188L, 171L, 145L, 192L, 188L, 184L, 
165L, 174L, 145L, 176L, 166L), homocidegun = c(74L, 56L, 51L, 
60L, 48L, 50L, 46L, 18L, 20L, 36L, 29L, 26L, 37L, 39L, 32L, 41L, 
35L), massgun = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L), suicidenotgun = c(2351L, 2393L, 2229L, 
2170L, 2208L, 2115L, 1976L, 1948L, 1951L, 1952L, 2072L, 2141L, 
2196L, 2239L, 2329L, 2374L, 2454L), homocidenotgun = c(253L, 
243L, 247L, 246L, 268L, 243L, 228L, 147L, 171L, 210L, 199L, 229L, 
232L, 202L, 240L, 221L, 184L), suicidetotal = c(2647L, 2641L, 
2490L, 2392L, 2470L, 2328L, 2164L, 2119L, 2096L, 2144L, 2260L, 
2325L, 2361L, 2413L, 2474L, 2550L, 2620L), homocidetotal = c(327L, 
299L, 298L, 306L, 316L, 293L, 274L, 165L, 191L, 246L, 228L, 255L, 
269L, 241L, 272L, 262L, 219L)), .Names = c("X", "year", "personyearsatrisk", 
"suicidegun", "homocidegun", "massgun", "suicidenotgun", "homocidenotgun", 
"suicidetotal", "homocidetotal"), class = "data.frame", row.names = c(NA, 
-17L))

I calculated the avgsuicidegun for 100,000 people in a year basically (suicidegun * 100000)/personyearsatrisk and similarly avghomocidegun.

The avgsum is the sume of avgsuicidegun and avghomocidegun

The below image shows the mean rate of firearms death per year trends. Please refer to trend line from 1997 to 2013 in this figure:

Fire Arms Mean Rate .

I am trying to find the Ratio of trends estimate in Annual Death Rate per 100000 Population (95% CI) for deaths involving FireArms

See this question for how to ask a good question: https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example — Grubbmeister, Oct 19 '18 at 02:24
I have edited and added some additional content to the Question. I am new to R and I am finding it difficult to fit the equation in the Model. — Srin, Oct 19 '18 at 05:14
It's not possible to use the data you have provided. Use dput(yourdataframe) to get usable data for other people. — Grubbmeister, Oct 19 '18 at 05:40
Also, you need to explain your dataset a little more. You are trying to derive a avgsum vs year plot, and fit a line to it. That isn't really what you describe in the first part of the question. Your model is actually something like avgsum~year. I also don't understand how you can have a mean rate of firearms death per year - there is only one replicate per year. — Grubbmeister, Oct 19 '18 at 05:49
I don't understand what the paper has done. The trends look like just simple linear regression (so I don't know why they have apparently used a negative binomial distribution), but the slopes in table 3 don't match the trends in the figure - at least some should be negative. I don't have time to figure out what they've done, but I will edit my answer give you a negative binomial model. — Grubbmeister, Oct 20 '18 at 01:24
You could try asking the question on cross validated, since this is mostly a stats question, as it turns out..... — Grubbmeister, Oct 20 '18 at 09:48

Ben Bolker · Answer 1 · 2018-10-20T16:15:03.287

1

I agree with @Grubbmeister that your description of the model looks odd. It should probably be:

To fit this model:

## compute total gun deaths per year
gundata <- transform(gundata, guntotal=suicidegun+homocidegun)
library(MASS)
g1 <- glm.nb(guntotal ~ 1 + year + offset(log(personyearsatrisk)),
       data=gundata)

To extract the coefficients you want:

coef(g1)  
coef(summary(g1))
1-exp(coef(g1)["year"])  ## 0.04903379

The "year" coefficient is -0.05027, which means gun deaths are decreasing at approximately 5%/year; the more precise value is given by the calculation above (i.e. decrease is 4.9% year).

I had only a quick look at the paper you linked, but the coefficient here (exp(coef(g1)["year"])) appears to agree with the value of 0.951 they quoted for a trend in the 1997-2013 period (this is the expected multiplicative decrease in gun deaths over the course of one year; the negative binomial model uses a logarithmic link function).

edited Oct 20 '18 at 16:15

answered Oct 20 '18 at 16:08

Ben Bolker

211,554
25
370
453

Thanks, I found that the model equation I provided is a poisson log linear regression model. So the log shouldn't be there for the Dependent variable-d. I edited the question – Srin Oct 20 '18 at 23:49
Is this the code for finding the Poisson log linear regression model g2 <- glm(guntotal ~ 1 + year + offset(log(personyearsatrisk)),family=poisson(link=log),data=gundata) – Srin Oct 21 '18 at 01:59
yes. I think it should be `family=poisson(link="log")` or just `family=poisson` (the log link is the default) – Ben Bolker Oct 21 '18 at 03:32
Excellent answer @BenBolker. Reading the paper, I find reporting the slope as mulitplicative decreases is very difficult to read. Srin, I thought this question might help you with choosing a regression model: https://stats.stackexchange.com/questions/71961/understanding-over-dispersion-as-it-relates-to-the-poisson-and-the-neg-binomial – Grubbmeister Oct 21 '18 at 10:01

Grubbmeister · Answer 2 · 2018-10-20T01:32:50.137

I don't quite understand how you can use a negative binomial distribution on a predictor that has been log transformed, since it is designed for integer data.

Anyway... we will work with what we have. I have not log transformed anything, but to do that, put log() around the variable you have to transform.

You should also have provided the following line of code:

death$avgsum<-((death$homocidegun+death$suicidegun)/death$personyearsatrisk)*100000

    install.packages("MASS")
    library("MASS")
    mod<-glm.nb(avgsum~year, data= death)
    #to check the residuals
    plot(mod)



#However, I don't think a negative binomial distribution works with this data, so I'd just use a simple linear regression instead:

mod1<-lm(avgsum~year, data= death)

The betas you ask about are constants in the final model, which you can obtain by:

summary(mod)

To get R2 values for the negative binomial model:

install.packages("jtools")
library(jtools)
summ(mod)

Fitting Negative Binomial Model

2 Answers2