1

I'm trying to write the following formula:

everything_time <- system.time(everything_rsf <- rfsrc(  Surv(time = PFS, event = Censoring, type = c("right")) 
                            ~.-c(Sample.ID,PatientID,UniqueID,PFS,Censoring), 
                           data = data_endogenous))

As you can see, I remove a whole bunch of variables. Yet, when I do everything_rsf$importance, I still have Sample.ID, PatientID, UniqueID in the model. I'm not sure why.

I've tried listing them out individually as well.

desertnaut
  • 57,590
  • 26
  • 140
  • 166
user1357015
  • 11,168
  • 22
  • 66
  • 111
  • Can you explain what led you to believe that that syntax would successfully remove or exclude variables from the formula? – joran Jun 23 '14 at 17:31
  • The model solution here shows how to remove one variable. I need to remove several. I tried listing them all out individually as well. http://stackoverflow.com/questions/5251507/how-to-succinctly-write-a-formula-with-many-variables-from-a-data-frame – user1357015 Jun 23 '14 at 17:31
  • Ok, you'll note that there's no use of `c()` there, so that _definitely_ won't work. – joran Jun 23 '14 at 17:33
  • 1
    Before commenting on you 'listing them out individually', I'd want to see the exact syntax you used for that too. – joran Jun 23 '14 at 17:34
  • 2
    Maybe instead of `data = data_endogenous` do something like `data = data_endogenous[, !(names(data_endogenous) %in% c("Sample.ID","PatientID","UniqueID",PFS,"Censoring"))]` ? – David Arenburg Jun 23 '14 at 17:34
  • @joran The listing out individually went like this: ~. -Sample.ID -PatientID -UniqueID – user1357015 Jun 23 '14 at 17:48
  • @DavidArenburg: Yeah, I ended up doing something very similary, just cutting the matrix ahead of time by removing the columns. That gave me the stats I wanted but I'm curious how the formula would work from a programming point of view when I want to remove variables. – user1357015 Jun 23 '14 at 17:51

1 Answers1

1

Formula won't accept list or vector arguments, a string be created.

data(iris)

fmla <- as.formula(paste("Species ~",
                   paste(c(".","Sepal.Length","Petal.Width"), collapse = " - ")))

glm(fmla, data = iris, family = binomial(link = "logit"))

Specific predictors can be defined too.

data(iris)

fmla <- as.formula(paste("Species ~", 
                   paste(grep("Width", names(iris), value = TRUE), collapse = " + ")))

glm(fmla, data = iris, family = binomial(link = "logit"))

Can combine these methods removing and adding predictors as desired, while leaving your dataset alone. Can be useful when making multiple models with the same dataset.

Matt L.
  • 397
  • 4
  • 12