2

I have been trying to create a function that uses a glm() inside it. But I always get an error message. It looks like the function does not retrieve the value of the variable.

set.seed(234)
sex <- sample(c("M", "F"), size=100, replace=TRUE)
age <- rnorm(n=100, mean=20 + 4*(sex=="F"), sd=0.1)
dsn <- data.frame(sex, age)
rm(sex, age) #remove sex and age from the global environment for reproducibility

to_analyze <- function(dep, indep, data){
  glm(dep~factor(indep), data=data)
}

to_analyze(dep=age, indep=sex, data=dsn)
#> Error in eval(predvars, data, env): object 'age' not found


halfer
  • 19,824
  • 17
  • 99
  • 186
SimRock
  • 229
  • 3
  • 10

3 Answers3

3

You could use any of the following:

Using substitute:

to_analyze <- function(dep, indep, data){
  glm(substitute(dep ~ factor(indep)), data=data)
}

to_analyze(dep=age, indep=sex, data=dsn)

Advantage: Can write the independent as a formula.

eg

 to_analyze(Petal.Width, Sepal.Length + Sepal.Width, data = iris)

Using reformulate as stated by @NelsonGon

to_analyze <- function(dep, indep, data){ 
  glm(reformulate(sprintf("factor(%s)",indep), dep),  data = data) 
 }

Note that to call this function, the variables aught to be of type character

 to_analyze(dep= "age", indep="sex", data=dsn)

Recall glm can also take a string that can be parsed to a formula:

to_analyze <- function(dep, indep, data){ 
  glm(sprintf("%s~factor(%s)", dep, indep),  data = data) 
}

to_analyze("age", "sex", data=dsn)

or even:

to_analyze <- function(dep, indep, data){ 
  glm(paste(dep,"~ factor(",indep,")"),  data = data) 
}

to_analyze("age", "sex", data=dsn)

LASTLY: to combine both the substitute and paste:

to_analyze <- function(dep, indep, data){ 
  glm(paste(substitute(dep),"~ factor(",substitute(indep),")"),  data = data) 
}

will work for both symbols and characters. eg:

to_analyze(age, sex, data=dsn)
to_analyze("age", "sex", data=dsn)
Onyambu
  • 67,392
  • 3
  • 24
  • 53
  • thanks again @Onyambu. It is nice to have lost of options – SimRock Feb 28 '20 at 19:08
  • Thanks again @Onyambu. The work well but I found another issue (see below). It does not support additional calls apparently. Any help would be greatly appreciated – SimRock Feb 29 '20 at 00:05
1

Create a "formula" object in the function and pass to glm.

To get the variables without giving an error the standard trick is deparse(substitute(.)).
Then compose the formula with paste.

to_analyze <- function(dep, indep, data){
  dep <- deparse(substitute(dep))
  indep <- deparse(substitute(indep))
  indep <- paste0("factor(", indep, ")")
  fmla <- paste(dep, indep, sep = " ~ ")
  fmla <- as.formula(fmla)
  glm(fmla, data = data)
}

to_analyze(dep=age, indep=sex, data=dsn)
#
#Call:  glm(formula = fmla, data = data)
#
#Coefficients:
# (Intercept)  factor(sex)M  
#      23.984        -3.984  
#
#Degrees of Freedom: 99 Total (i.e. Null);  98 Residual
#Null Deviance:     396.2 
#Residual Deviance: 0.837   AIC: -188.5
Rui Barradas
  • 70,273
  • 8
  • 34
  • 66
0

@Onyambu and others. The substitute command seems to work well for just one call as it works for the to_analyze(). However when I call another function inside it, it is complaining again. Any help would be greatly appreciated

to_analyze <- function(dep, indep, data){
  glm(substitute(dep ~ factor(indep)), data=data)
}

to_analyze(dep=age, indep=sex, data=dsn)
#> 
#> Call:  glm(formula = substitute(dep ~ factor(indep)), data = data)
#> 
#> Coefficients:
#>  (Intercept)  factor(sex)M  
#>       24.006        -4.034  
#> 
#> Degrees of Freedom: 99 Total (i.e. Null);  98 Residual
#> Null Deviance:       397.3 
#> Residual Deviance: 0.8152    AIC: -191.2

However, I am stuck again because I am trying to call the output from this model in lsmeans::lsmeans() to predict marginal means and return the output but it is giving me an error. Although it does not need an offset, I am including it here so that I can get a more general code that I can modify later. Any help would be greatly appreciated

to_predict_lsmeans <- function(dep, indep, data){
  model <- glm(substitute(dep ~ factor(indep)), data=data)
  pred <- lsmeans:: lsmeans(model, substitute(~ factor(indep)), offset=substitute(data)$log(age), type ="response" )
  return(pred)
}

pred <- to_predict_lsmeans(dep=age, indep=sex, data=dsn)
#> Error in ref_grid(object, ...): We are unable to reconstruct the data.
#> The variables needed are:
#>  sex
#> Are any of these actually constants? (specify via 'params = ')
#> The dataset name is:
#>  data
#> Does the data still exist? Or you can specify a dataset via 'data = '
pred
#> Error in eval(expr, envir, enclos): object 'pred' not found
SimRock
  • 229
  • 3
  • 10
  • what exactly is `substitute(data)$log(age)`??? There is nothing like this in R. If you have a dataframe, you do `log(data$age)` and not `data$log(age)`. Also, you should never substitute your data. Check my code. Data was not substituted. Lastly, you should use the data argument within `lsmeans`. Thus your code should look like: `lsmeans:: lsmeans(model, substitute(~ factor(indep)), offset=log(data$age),data=data, type ="response" )` – Onyambu Feb 29 '20 at 00:30
  • Thank you. It still does not seem to work. The `lsmean()` does not take a data argument except to call a variable directlyt. When I do the following `dep=dsn$age; indep=dsn$sex; data=dsn; model <- glm(dep ~ factor(indep), data=data); lsmeans::lsmeans(model, ~ factor(indep), offset=log(data$age), type ="response")` it works but when I put it in the function it does not work. `to_predict_lsmeans <- function(dep, indep, data){ model <- glm(substitute(dep ~ factor(indep)), data=data); lsmeans::lsmeans(model, substitute(~ factor(indep)), offset=log(data$age), type ="response") return(pred)}` – SimRock Feb 29 '20 at 00:45
  • `lsmean` does take data. Whatever you are doing works because the indep and dep are in the `.GlobalEnv` so lsm can look for them. eg even `glm(dep~factor(indep))` would still work since `dep` and `indep` are in the `global` environment. What if they are not in the environment? then you should use the `data = dsn` in both the `glm` and `lsmean`. If it doesnt work at all, use `as.formula(substitute(~factor(indep)))` – Onyambu Feb 29 '20 at 00:54
  • Thank you. But still no luck. I fixed the code as you suggested: `to_predict_lsmeans <- function(dep, indep, data){ model <- glm(substitute(dep ~ factor(indep)), data=data) lsmeans::lsmeans(model, as.formula(substitute(~ factor(indep))), offset=log(data$age), data = data, type ="response") return(pred) } pred <- to_predict_lsmeans(dep=age, indep=sex, data=dsn) pred` It is returning the following `function(dep=age, indep=sex, data=dsn) pred <- to_predict_lsmeans(dep=age, indep=sex, data=dsn)` did yours code work?` – SimRock Feb 29 '20 at 01:11
  • How does a function return a function? You must have written a function within a function for it to do that. It seems you have a much bigger problem than what you originally asked – Onyambu Mar 01 '20 at 23:17