0

I am attempting to do survival analysis on several genes through using a function. The patients are grouped according to 'high'(2) or 'low'(1) gene expression.

I'm having trouble with how R is understanding my code. Here is some sample data:

df <- read.table(header=T, text="TGM5    TGM6    TGM7    TPI1    survival    vital.status
2   1   1   2   1.419178082 2
2   1   1   1   5   1
2   1   1   2   1.082191781 2
1   1   1   1   0.038356164 1
2   1   2   2   0.77260274  2
1   1   2   2   2.336986301 1
2   1   2   1   1.271232877 1")

the following code works fine:

fit<- survfit(Surv(survival, vital.status)~TGM5), data =df)

The problem I run into is when I want to do this for many genes. I've create an array/list of gene names I'm interested in:

> genes <- names(df[1:3])
> genes[1]
[1] "TGM5"

but if I call

fit<- survfit(Surv(survival, vital.status)~genes[1]), data =df)

I get the error

    Error in model.frame.default(formula = Surv(survival, vital.status) ~ (genes[1]),  : 
  variable lengths differ (found for 'genes[1]')

I assume there is a difference in when I call TGM5 directly vs. when it's called as an element from the gene list and the solution is very simple. I'm at a loss as to how to approach this. I've attempted using gsub() but without success.

Finally, as I would like to expand this code over many genes, I'd like to avoid creating a for loop, is there a vectorized way I could go about this?

Many thanks.

Tkanno
  • 656
  • 6
  • 14
  • 1
    Why are you using `genes[1]`? You need all of `genes`, don’t you? – Konrad Rudolph Aug 04 '16 at 16:29
  • I'm using genes[1] as an example of the problem I'm running into. I will eventually want to analyse all the genes but I am not aware if survfit() will like taking multiple genes at once. – Tkanno Aug 04 '16 at 16:38
  • 1
    Maybe somewhat related http://stackoverflow.com/questions/22617354/object-not-found-error-within-a-user-defined-function-eval-function/ – David Arenburg Aug 05 '16 at 09:22
  • Yes it seems to be the exact same problem. Thanks for pointing me to that! – Tkanno Aug 08 '16 at 11:27

2 Answers2

1

This hack works:

survfit(as.formula(paste0("Surv(survival, vital.status)~",genes[1])), data =df)

Call: survfit(formula = as.formula(paste0("Surv(survival, vital.status)~", 
    genes[1])), data = df)

       n events median 0.95LCL 0.95UCL
TGM5=1 2      0     NA      NA      NA
TGM5=2 5      3   1.42    1.08      NA

I paste the formula together and then coerce it into a formula object. There is a nicer function called reformulate that will often work for this, but I couldn't get it to work with

Surv(survival, vital.status)

as the response argument level.

data

df <- read.table(header=T, text="TGM5    TGM6    TGM7    TPI1    survival    vital.status
2   1   1   2   1.419178082 2
2   1   1   1   5   1
2   1   1   2   1.082191781 2
1   1   1   1   0.038356164 1
2   1   2   2   0.77260274  2
1   1   2   2   2.336986301 1
2   1   2   1   1.271232877 1")
lmo
  • 37,904
  • 9
  • 56
  • 69
0

You need to remove the quotation marks: noquote(genes[1])