0

I have data which looks like this:

df <- data.frame (
time = rep(c("2010", "2011", "2012", "2013", "2014"),4),
age = rep(c("40-44", "45-49", "50-54", "55-59", "60-64"),4),
weight = rep(c(0.38, 0.23, 0.19, 0.12, 0.08),4),
ethgp = rep(c(rep("M",5),rep("NM",5)),2),
gender = c(rep("M",10), rep("F",10)),
pop = round((runif(10, min = 10000, max = 99999)), digits = 0),
count = round((runif(10, min = 1000, max = 9999)), digits = 0)
)

df <- df %>%
mutate(rate = count / pop,
     asr_rate = (rate * weight)*100000, 
     asr_round = round(asr_rate, digits = 0))

First, I remove all zero values from the dataframe

 df <- df [apply(df!=0, 1, all),]

Then I run the following code, to run multiple Poisson regression models, for each sub-group within this data (age, gender, and year); comparing ethnic groups (M / NM). I want to generate rate ratios, and CIs, comparing M with NM, for all sub-groups.

Poisson_test <- df %>% group_by(time, gender, age) %>% 
do({model = glm(asr_round ~ relevel(ethgp, ref = 2), family = "poisson", data = .);
data.frame(nlRR_MNM = coef(model)[[2]], SE_MNM = summary(model)$coefficients[,2][2])})

This code works fine for the sample above.

When I run this code on my actual dataset, however, I get the following error message: Error in contrasts<-(tmp, value = contr.funs[1 + isOF[nn]]) : contrasts can be applied only to factors with 2 or more levels

Because I have only one explanatory variable, ethgp, I assume this is the source of the error?

I tested whether there are levels in my data (not in the sample data):

str(M_NM_NZ$ethgp)

R responds: Factor w/ 2 levels "M","NM": 1 1 1 1 1 1 1 1 1 1 ...

I checked if there were NA values in the ethgp

sum(is.na(M_NM_NZ%ethgp))

R responds [1] 0

Are there other reasons I might be getting this error message?

I have seen this question Error in contrasts when defining a linear model in R But in this example, it sounds like the explanatory variable is not in the correct format, or has NA values. This is not the case in my data. Are there other reasons I might be getting this error?

Laura
  • 499
  • 5
  • 13

1 Answers1

0

I don't understand the underlying problem which causes this error when a factor does have more than one level.

In this instance I fixed the issue by converting the ethgp variable into a numeric variable.

df <- df %>%
mutate(ethnum = ifelse(ethgp == "M", 1, 0))

And then running the regressions using ethnum as the explanatory variable.

Poisson <- df %>% group_by(time, gender, age) %>% 
do({model = glm(asr_round ~ ethnum, family = "poisson", data = .);
data.frame(nlRR_MNM = coef(model)[[2]], nlUCI = confint(model)[2,2], nlLCI = confint(model)[2,1])})
Poisson <- mutate(Poisson,
              RR_MNM = round(exp(nlRR_MNM),digits = 3),
              UCI = round(exp(nlUCI),digits = 3),
              LCI = round(exp(nlLCI),digits = 3))

This code also computes the upper and lower 95% confidence intervals for each rate ratio.

Laura
  • 499
  • 5
  • 13