In common with other machine learning methods, I divided my original data set (7-training data set: 3-test data set).
Here is my code.
install.packages(randomForestSRC)
library(randomForestSRC)
data(pbc, package="randomForestSRC")
data <- na.omit(pbc)
train <- sample(1:nrow(data), round(nrow(data) * 0.70))
data.grow <- rfsrc(Surv(days, status) ~ .,
data[train, ],
ntree = 100,
tree.err=T,
importance=T,
nsplit=1,
proximity=T)
data.pred <- predict(data.grow,
data[-train , ],
importance=T,
tree.err=T)
What I have a question is that predict function in this code.
Originally, I wanted to construct a prediction model based on random survival forest to predict the diseae development.
For example, After I build the prediction model with training data set, I wanted to know the probability of disease development with test data which has no information about disease incidence for each individual becuase I would like to know the probability of diease development based on the subject's general characteristics such as age, bmi, sex, something like that.
However, unlike my intention to build a predicion model as I said above, "predict" function in this package didn't work based on the data which has no status information (event/censored).
"predict" function must work with outcome information (event/censored).
Therefore, I cannot understand what the "predict" function means.
If "precict" function works only with oucome information, then how can I make a predction for disease development based on the subject's general characteristics in the future?
In addition, if the prediction in this model is constructed with the outcome information, what the meaning is "predct" in the random survival forest model.
Please let me know what the "predict" function in this package means is.
Thank you for reading my long question.