4

Hi I am doing linear regression in R and want to create marginal(adjusted) means for my subgroups.

The linear regression includes age, gender, education, marital status, employ, ethnicity and health condition. I wanted to get means and SDs for each ethnicity X health condition group. However, when I put emmeans to get the marginal means, it shows this error message:

Error: The rows of your requested reference grid would be 19800, which exceeds
the limit of 10000 (not including any multivariate responses).
Your options are:
  1. Specify some (or more) nuisance factors using the 'nuisance' argument
     (see ?ref_grid). These must be factors that do not interact with others.
  2. Add the argument 'rg.limit = <new limit>' to the call. Be careful,
     because this could cause excessive memory use and performance issues.
     Or, change the default via 'emm_options(rg.limit = <new limit>)'.

Can anyone help, please? Thanks!

Russ Lenth
  • 5,922
  • 2
  • 13
  • 21
AFWWU
  • 41
  • 1

1 Answers1

4

emmeans works by creating a grid of all combinations of the predictor levels - called the reference grid. In this case there are 19,800 such factor combinations. The estimated marginal means (EMMs) are obtained by averaging appropriate subsets of these 19,800 predictions.

The message says that the size of this reference grid is larger than an internal limit of 10,000 that exists to keep us out of trouble consuming too much memory. But chances are, since you have less than twice the limit, it will still work if you just increase the limit as suggested as the second option in the error message; e.g.,

emmeans(..., rg.limit = 20000)

But you can also (via the first suggestion in the message) cut down the size of the grid by specifying nuisance variables; these must be factors that don't interact with other predictors in your model formula. Those factors are excluded from the grid by "pre-averaging" over them. So for example, if in your model formula, neither gender nor employ interact with your primary factors (nor with each other), you can do

emmeans(..., nuisance = c("gender", "employ"))

and that would be enough to bring the size of the reference grid well within the limit.

Russ Lenth
  • 5,922
  • 2
  • 13
  • 21