I am attempting to do survival analysis on several genes through using a function. The patients are grouped according to 'high'(2) or 'low'(1) gene expression.
I'm having trouble with how R is understanding my code. Here is some sample data:
df <- read.table(header=T, text="TGM5 TGM6 TGM7 TPI1 survival vital.status
2 1 1 2 1.419178082 2
2 1 1 1 5 1
2 1 1 2 1.082191781 2
1 1 1 1 0.038356164 1
2 1 2 2 0.77260274 2
1 1 2 2 2.336986301 1
2 1 2 1 1.271232877 1")
the following code works fine:
fit<- survfit(Surv(survival, vital.status)~TGM5), data =df)
The problem I run into is when I want to do this for many genes. I've create an array/list of gene names I'm interested in:
> genes <- names(df[1:3])
> genes[1]
[1] "TGM5"
but if I call
fit<- survfit(Surv(survival, vital.status)~genes[1]), data =df)
I get the error
Error in model.frame.default(formula = Surv(survival, vital.status) ~ (genes[1]), :
variable lengths differ (found for 'genes[1]')
I assume there is a difference in when I call TGM5 directly vs. when it's called as an element from the gene list and the solution is very simple. I'm at a loss as to how to approach this. I've attempted using gsub() but without success.
Finally, as I would like to expand this code over many genes, I'd like to avoid creating a for loop, is there a vectorized way I could go about this?
Many thanks.