4

I want use survfit() and basehaz() inside a function, but they do not work. Could you take a look at this problem. Thanks for your help. The following code leads to the error:

library(survival)

n <- 50      # total sample size
nclust <- 5  # number of clusters
clusters <- rep(1:nclust,each=n/nclust)
beta0 <- c(1,2)
set.seed(13)
#generate phmm data set
Z <- cbind(Z1=sample(0:1,n,replace=TRUE),
       Z2=sample(0:1,n,replace=TRUE),
       Z3=sample(0:1,n,replace=TRUE))
b <- cbind(rep(rnorm(nclust),each=n/nclust),rep(rnorm(nclust),each=n/nclust))
Wb <- matrix(0,n,2)
for( j in 1:2) Wb[,j] <- Z[,j]*b[,j]
Wb <- apply(Wb,1,sum)
T <- -log(runif(n,0,1))*exp(-Z[,c('Z1','Z2')]%*%beta0-Wb)
C <- runif(n,0,1)
time <- ifelse(T<C,T,C)
event <- ifelse(T<=C,1,0)
mean(event)
phmmd <- data.frame(Z)
phmmd$cluster <- clusters
phmmd$time <- time
phmmd$event <- event

fmla <- as.formula("Surv(time, event) ~ Z1 + Z2")

BaseFun <- function(x){
start.coxph <- coxph(x, phmmd)  

print(start.coxph)

betahat <- start.coxph$coefficient
print(betahat) 
print(333)  
print(survfit(start.coxph))                                                                                                                                                                                                                                     
m <- basehaz(start.coxph)
print(m)
}  
BaseFun(fmla)
Error in formula.default(object, env = baseenv()) : invalid formula

But the following function works:

fit <- coxph(fmla, phmmd)    
basehaz(fit)
Paul Hiemstra
  • 59,984
  • 12
  • 142
  • 149
moli
  • 129
  • 2
  • 8

2 Answers2

5

It is a problem of scoping. Notice that the environment of basehaz is:

environment(basehaz)
<environment: namespace:survival>

meanwhile:

environment(BaseFun)
<environment: R_GlobalEnv>

Therefore that is why the function basehaz cannot find the local variable inside the function.

A possible solution is to send x to the top using assign:

 BaseFun <- function(x){

    assign('x',x,pos=.GlobalEnv)

    start.coxph <- coxph(x, phmmd)  
    print(start.coxph)

    betahat <- start.coxph$coefficient
    print(betahat) 
    print(333)  
    print(survfit(start.coxph)) 

    m <- basehaz(start.coxph)
    print(m) 
    rm(x)

       }  
    BaseFun(fmla)

Other solutions may involved dealing with the environments more directly.

aatrujillob
  • 4,738
  • 3
  • 19
  • 32
  • Nice find. Any special reason to assign `m` to global environment? – Roman Luštrik Jan 25 '12 at 10:06
  • @RomanLuštrik no reason that is a typo actually. txs for catching it. – aatrujillob Jan 25 '12 at 10:19
  • 1
    @AndresT Thank you very much. It works. Terry Therneau also provided a solution: The survfit routine needs to reconstruct the model matrix, and by default in R this is done in the context where the model formula was first defined. Unfortunately this is outside the function, leading to problems -- your argument "x" is is unknown in the outer envirnoment. "The solution is to add "model=TRUE" to the coxph call so that the model frame is saved and survfit doesn't have to do reconstruction." – moli Jan 26 '12 at 00:06
2

I'm following up on @moli's comment to @aatrujillob's answer. They were helpful so I thought I would explain how it solved things for me and a similar problem with the rpart and partykit packages.

Some toy data:

N <- 200
data <- data.frame(X = rnorm(N),W = rbinom(N,1,0.5))
data <-  within( data, expr = {
  trtprob <- 0.4 + 0.08*X + 0.2*W -0.05*X*W
  Trt <- rbinom(N, 1, trtprob)
  outprob <- 0.55 + 0.03*X -0.1*W - 0.3*Trt
  Outcome <- rbinom(N,1,outprob)
  rm(outprob, trtprob)
})

I want to split the data to training (train_data) and testing sets, and train the classification tree on train_data.

Here's the formula I want to use, and the issue with the following example. When I define this formula, the train_data object does not yet exist.

my_formula <- Trt~W+X 
exists("train_data")
# [1] FALSE
exists("train_data", envir = environment(my_formula))
# [1] FALSE

Here's my function, which is similar to the original function. Again,

badFunc <- function(data, my_formula){
  train_data <- data[1:100,]
  ct_train <- rpart::rpart(
    data= train_data,
    formula = my_formula,
    method = "class")
  ct_party <- partykit::as.party(ct_train)
}

Trying to run this function throws an error similar to OP's.

library(rpart)
library(partykit)

bad_out <- badFunc(data=data, my_formula = my_formula)
# Error in is.data.frame(data) : object 'train_data' not found 
# 10.   is.data.frame(data) 
# 9.    model.frame.default(formula = Trt ~ W + X, data = train_data, 
#          na.action = function (x) {Terms <- attr(x, "terms") ... 
# 8.    stats::model.frame(formula = Trt ~ W + X, data = train_data, 
#          na.action = function (x) {Terms <- attr(x, "terms") ... 
# 7.    eval(expr, envir, enclos) 
# 6.    eval(mf, env) 
# 5.    model.frame.rpart(obj) 
# 4.    model.frame(obj) 
# 3.    as.party.rpart(ct_train) 
# 2.    partykit::as.party(ct_train) 
# 1.    badFunc(data = data, my_formula = my_formula) 

print(bad_out)
# Error in print(bad_out) : object 'bad_out' not found

Luckily, rpart() is like coxph() in that you can specify the argument model=TRUE to solve these issues. Here it is again, with that extra argument.

goodFunc <- function(data, my_formula){
  train_data <- data[1:100,]
  ct_train <- rpart::rpart(
    data= train_data,
    ## This solved it for me
    model=TRUE,
    ## 
    formula = my_formula,
    method = "class")
  ct_party <- partykit::as.party(ct_train)
}
good_out <- goodFunc(data=data, my_formula = my_formula)
print(good_out)    
# Model formula:
# Trt ~ W + X
# 
# Fitted party:
# [1] root
# |   [2] X >= 1.59791: 0.143 (n = 7, err = 0.9)
##### etc

documentation for model argument in rpart():

model:

if logical: keep a copy of the model frame in the result? If the input value for model is a model frame (likely from an earlier call to the rpart function), then this frame is used rather than constructing new data.

Formulas can be tricky as they use lexical scoping and environments in a way that is not always natural (to me). Thank goodness Terry Therneau has made our lives easier with model=TRUE in these two packages!

BarkleyBG
  • 664
  • 5
  • 16