1

I have a formula that looks as follows:

formula <- as.formula(y ~ x + as.factor(z) + A + as.factor(B) + C:as.factor(A) + as.factor(D) + E + F + as.factor(G))

I would like to extract all the variable names that have factors to turn them to factors. If I use all.vars(formula), I get all variables and not just the as.factor().

Desired result:

factornames <- c("z", "B", "A", "D", "G")

I eventually want to feed the selected variables to:

# Turn factors into factors
DF[factornames] <- lapply(DF[factornames], factor)
## turn factor variables into dummies
DF <- as.data.frame(model.matrix(phantom ~ ., transform(DF, phantom=0)))
Tom
  • 2,173
  • 1
  • 17
  • 44

2 Answers2

1

You can do some string manipulation to get the column names which are factors.

factornames <- stringr::str_match_all(as.character(formula)[3], 'as.factor\\(([A-Za-z])\\)')[[1]][,-1]
factornames
#[1] "z" "B" "A" "D" "G"

([A-Za-z]) part of regex should be changed based on the column names in your data.

Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
1

We can deparse the formula, then grepexp everything in parentheses preceeded with "factor" using this historic solution.

r <- Reduce(paste0, deparse(formula))
el(regmatches(r, gregexpr("(?<=factor\\().*?(?=\\))", r, perl=T)))
# [1] "z" "B" "A" "D" "G"
jay.sf
  • 60,139
  • 8
  • 53
  • 110