0

I've built a GLM model that basically says the following:

glm(conversion ~ action, data = data, family = "binomial"(link="logit")

Some of the results from the variable "action" aren't relevant to the model (i.e. "Did not use"). However, in the other models I've built, those records are still important. I don't want to filter my data frame just for this one model if at all possible.

This question tells me how to exclude columns. Is there a way to exclude specific records from GLM in the formula?

1 Answers1

1

You can use the subset argument that many of the modelling functions in R have. For example:

glm(conversion ~ action, data = data, family = binomial(),
    subset = action != "Did not use")

will fit the model to the data set after removing rows where action == "Did not use". If you have additional levels in action to drop, you might use

glm(conversion ~ action, data = data, family = binomial(),
        subset = !action %in% c("Did not use", "Other"))

which will exclude any rows where action is equal to either of the supplied options.

You might also want to look at the drop.unused.levels argument to model.frame, which is the function that will act on any subset argument you supply to glm().

PS: note how I have specified the family; you don't need to do the weird combination of quoting. one of binomial, binomial() or "binomial" should be fine as the logit link is the canonical link for the binomial family and hence it is the default in R's bionmial() family function. If you want to specify the link, use this form: binomial(link = "logit").

Gavin Simpson
  • 170,508
  • 25
  • 396
  • 453