The hardest part of this was trying to recreate your data structure so that we could have a working example for your code. Of course, the values and factor levels will be completely different from your own data, but it should be enough for a demonstration:
set.seed(69)
df <- data.frame(Education_a = factor(c("Private", "Public")),
GroupA_a = factor(c("A1", "A2")),
GroupB_a = factor(c("B1", "B2")),
GroupC_a = factor(c("C1", "C2")),
GroupD_a = factor(c("D1", "D2")),
GroupE_a = factor(c("E1", "E2")))
BCC_a <- expand.grid(df)[rep(1:64, 20), ]
BCC_a$Age_a <- round(rgamma(64 * 20, 15, 1))
BCC_a$Income_a <- rgamma(64 * 20, 15, 1/2000)
lambdas <- apply(do.call(cbind, lapply(BCC_a[1:6],
function(x) runif(2, 0.5, 1.5)[as.numeric(x)]
)), 1, prod)
BCC_a$PS_av <- rpois(nrow(BCC_a), 1 + lambdas/2 * BCC_a$Age_a^2 + 0.001 * BCC_a$Income_a)
Here I have assumed that age and income are numeric variables, whereas the groups are factor variables:
head(BCC_a)
#> Education_a GroupA_a GroupB_a GroupC_a GroupD_a GroupE_a Age_a Income_a PS_av
#> 1 Private A1 B1 C1 D1 E1 15 30500.19 162
#> 2 Public A1 B1 C1 D1 E1 16 41160.54 170
#> 3 Private A2 B1 C1 D1 E1 13 43146.83 107
#> 4 Public A2 B1 C1 D1 E1 18 33023.85 124
#> 5 Private A1 B2 C1 D1 E1 8 31122.07 65
#> 6 Public A1 B2 C1 D1 E1 21 26487.43 215
Now let's create your model:
library(MASS)
M_PS_av <- glm.nb(PS_av ~ poly(Age_a,2) + Income_a + Education_a + GroupA_a +
GroupB_a + GroupC_a + GroupD_a + GroupE_a, data = BCC_a)
And we can review it with summary(M_PS_av)
#> glm.nb(formula = PS_av ~ poly(Age_a, 2) + Income_a + Education_a +
#> GroupA_a + GroupB_a + GroupC_a + GroupD_a + GroupE_a, data = BCC_a,
#> init.theta = 814.4965099, link = log)
#>
#> Deviance Residuals:
#> Min 1Q Median 3Q Max
#> -3.4821 -0.6993 -0.0217 0.6828 4.1628
#>
#> Coefficients:
#> Estimate Std. Error z value Pr(>|z|)
#> (Intercept) 4.750e+00 1.273e-02 372.981 < 2e-16 ***
#> poly(Age_a, 2)1 1.309e+01 1.012e-01 129.326 < 2e-16 ***
#> poly(Age_a, 2)2 -1.077e+00 8.885e-02 -12.118 < 2e-16 ***
#> Income_a 8.215e-06 3.486e-07 23.565 < 2e-16 ***
#> Education_aPublic -1.487e-01 5.464e-03 -27.218 < 2e-16 ***
#> GroupA_aA2 -3.534e-01 5.523e-03 -63.989 < 2e-16 ***
#> GroupB_aB2 -2.518e-02 5.481e-03 -4.593 4.37e-06 ***
#> GroupC_aC2 7.447e-02 5.445e-03 13.676 < 2e-16 ***
#> GroupD_aD2 -3.102e-02 5.442e-03 -5.701 1.19e-08 ***
#> GroupE_aE2 -4.514e-02 5.446e-03 -8.289 < 2e-16 ***
#> ---
#> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#>
#> (Dispersion parameter for Negative Binomial(814.4965) family taken to be 1)
#>
#> Null deviance: 26983 on 1279 degrees of freedom
#> Residual deviance: 1345 on 1270 degrees of freedom
#> AIC: 9952.3
#>
#> Number of Fisher Scoring iterations: 1
#>
#> Theta: 814
#> Std. Err.: 234
#> 2 x log-likelihood: -9930.252
Now, to use predict
we need a data frame of the predictors set to the levels we want to examine. Note we need all predictors, and if there are factor variables, we need to give the named factor levels:
new_data <- data.frame(Age_a = 15, Income_a = mean(BCC_a$Income_a),
Education_a = "Private", GroupA_a = "A1", GroupB_a = "B1",
GroupC_a = "C1", GroupD_a = "D1", GroupE_a = "E1")
Now we just plug this into predict. Note that we need to use type = "response"
to get the actual expected value of the outcome variable (otherwise we will get the natural log of the expected value):
predict(M_PS_av, newdata = new_data, type = "response")
#> 1
#> 153.0262
This looks correct for the data I have put in.