1

I am writing an R package, in which users write formulas that look like this:

outcome ~ var1 + var2 + mm(id, mmc(var3, var4), mmw(pupils^exp(teacher*b)))

The right-hand side includes variable names and the element mm(), which itself contains a variable name (id) and the elements mmc() and mmw().

I would like to separate mm(), mmc(), mmw(), i.e. end up with variables

mm  = id, mmc(var3, var4), mmw(pupils^exp(teacher*b))
mmc = var3, var4
mmw = pupils^exp(teacher*b)

Is my only option to parse the formula as characters and then use regex to separate the elements or are there ways to handle this more elegantly since it is a formula?

I have tried

all.vars
all.names

but they break up mmw() too much since mmw() typically contains nonlinear functional relationships

Ben
  • 197
  • 1
  • 9
  • 1
    does this help http://www.cookbook-r.com/Formulas/Extracting_components_from_a_formula/ ? – GordonShumway Sep 25 '19 at 17:58
  • The problem with using indices is that the order might be unexpected: **outcome ~ mm(id, mmc(var3, var4), mmw(pupils^exp(teacher*b))) + var1 + var2**. Is there a way to get the right index for mm(), mmc(), and mmw()? – Ben Sep 25 '19 at 18:09

1 Answers1

3

1) Using getTerms from Terms of a sum in a R expression we can directly parse the formula without using regular expressions. First we get the terms tt and then form mm which is the term having more than one element. From that extract the others. No packages are used.

fo <- outcome ~ var1 + var2 + mm(id, mmc(var3, var4), mmw(pupils^exp(teacher * b)))

tt <- getTerms(fo[[3]])
mm <- as.list(tt[lengths(tt) > 1][[1]])[-1]
mmc <- as.list(mm[[2]][-1])
mmw <- as.list(mm[[3]][-1])

giving:

> mm
[[1]]
id

[[2]]
mmc(var3, var4)

[[3]]
mmw(pupils^exp(teacher * b))

> mmc
[[1]]
var3

[[2]]
var4

> mmw
[[1]]
pupils^exp(teacher * b)

2) Alternately we might incorporate the processing right into getTerms giving getMs as follows:

getMs <- function(e, x = list()) {
  if (length(e) == 1) x
  else if (identical(e[[1]], as.name("+")))
    c( Recall(e[[2]], x), Recall(e[[3]], x) )
  else if (as.character(e[[1]]) %in% c("mm", "mmw", "mmc")) {
      for(i in 2:length(e)) x <- Recall(e[[i]], x)
      c(setNames(list(as.list(e[-1])), as.character(e[[1]])), x)
  } else x
}
res <- getMs(fo[[3]])
str(res)

giving:

List of 3
 $ mm :List of 3
  ..$ : symbol id
  ..$ : language mmc(var3, var4)
  ..$ : language mmw(pupils^exp(teacher * b))
 $ mmw:List of 1
  ..$ : language pupils^exp(teacher * b)
 $ mmc:List of 2
  ..$ : symbol var3
  ..$ : symbol var4
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341