0

I have moderate experience with R. I am trying to run a Cox regression with a for loop using the survival package. My dataframe (df1) contains multiple health outcomes as "events". I want to regress "FA_low" on health outcomes and time, adding age sex and pc1-pc10 as covariates.

This is a subset of the dataframe (df1) that I generated using dput(df1[1:2, -c(3,4)]:

structure(list(id = c("1000016", "1000028"), FA_low = c("1", 
"1"), sex = c("F", "F"), age = c(56L, 66L), pc1 = c(125.117, 
-9.61593), pc2 = c(-67.8548, 5.7494), pc3 = c(57.7852, -1.71108
), pc4 = c(7.68796, -4.73091), pc5 = c(0.445619, -3.22911), pc6 = c(2.93785, 
-0.0760323), pc7 = c(7.02217, 2.93723), pc8 = c(4.40888, 0.982279
), pc9 = c(-0.704416, -0.161818), pc10 = c(5.46248, -0.579022
), time = c(5, 5), '250' = c(FALSE, FALSE), '250.2' = c(FALSE, 
FALSE), '250.23' = c(FALSE, FALSE), '272' = c(NA, FALSE), '272.1' = c(NA, 
FALSE), '272.11' = c(NA, FALSE), '274.1' = c(FALSE, FALSE), '278' = c(FALSE, 
FALSE), '278.1' = c(FALSE, FALSE), '351' = c(FALSE, FALSE), `'401' = c(NA, 
FALSE), '401.1' = c(NA, FALSE), '411' = c(NA, FALSE), '411.4' = c(NA, 
FALSE), '411.8' = c(NA, FALSE), '454' = c(FALSE, FALSE), '454.1' = c(FALSE, 
FALSE), '512.7' = c(FALSE, FALSE), '550' = c(NA, FALSE), '550.2' = c(NA, 
FALSE), '550.4' = c(NA, FALSE), '740' = c(NA, FALSE), '740.1' = c(NA, 
FALSE), '907' = c(FALSE, FALSE)), row.names = 1:2, class = "data.frame")

Structure:

'data.frame':   426295 obs. of  41 variables:
 $ id             : chr  "1000016" "1000028" "1000033" "1000042" ...
 $ FA_low         : chr  "1" "1" "0" "0" ...
 $ sex            : chr  "F" "F" "F" "F" ...
 $ age            : int  56 66 64 50 69 63 42 41 62 64 ...
 $ pc1            : num  125.12 -9.62 -12.53 -12.29 -11.33
 $ time           : num  5 5 5 5 5 5 5 5 5 5 ...
 $ 250            : logi  FALSE FALSE FALSE NA FALSE FALSE ..

.

When I run my analysis without a loop for each health outcome separately, it works fine. When I try to create a for loop with the health outcomes as iterations as follows:

for(i in 1:24){ df.model<-na.omit(df1[c(1:17,17+i)])

cox.mod <- coxph( Surv(time, i) ~ FA_low + age + sex + pc1 + pc2 + pc3 + pc4 + pc5 + pc6 + pc7 + pc8 + pc9 + pc10, data = df.model)

cox1 <- summary(cox.mod)

I get the following error: Error in Surv(time, i) : Time and status are different lengths

The number of observations in these columns is the same. I am inclined to think that my for loop does not match the way the Surv() function works. I went through the documentation for the Surv() package but I still can't solve this. I have seen questions and answers regarding for loops for 'time' but not events. How do I create a for loop that works with iterations for events in this survival analysis?

md93
  • 3
  • 3
  • I'd like to try to help you answer your question. Are you able to share example data? https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – Skaqqs Sep 01 '21 at 12:32
  • I just noticed you are using `na.omit()` to define your modelling dataset; this may lead to your error. What happens when you leave `NA` in your data? – Skaqqs Sep 01 '21 at 12:35
  • Thanks for this. I have made edits to the original post showing a subset of my dataframe. I also tried running the code without na.omit() as you suggested but I still get the same error... – md93 Sep 01 '21 at 13:35
  • Try placing `print(i)` in the first line of your loop and investigating the column (iteration) that gives the error. – Skaqqs Sep 01 '21 at 14:14
  • 1
    Try replacing `Surv(time, i)` with `Surv(time, df.model[[i]])` – Skaqqs Sep 01 '21 at 14:29

1 Answers1

1

I think the error you're seeing is related to how Surv() expects its arguments to be formatted within coxph(). It expects column names as variables rather than their position (i.e. your use of i). One solution is to call values of each status directly. Check this out:

library(survival)
#> Warning: package 'survival' was built under R version 4.0.5

test1 <- list(time=c(4,3,1,1,2,2,3), 
              status=c(1,1,1,0,1,1,0), 
              x=c(0,2,1,1,1,0,0), 
              sex=c(0,0,0,0,1,1,1),
              status2=c(TRUE, TRUE, FALSE, FALSE, FALSE, FALSE, TRUE)) 

## This works

coxph(Surv(time, status) ~ x + strata(sex), test1)
#> Call:
#> coxph(formula = Surv(time, status) ~ x + strata(sex), data = test1)
#> 
#>     coef exp(coef) se(coef)     z     p
#> x 0.8023    2.2307   0.8224 0.976 0.329
#> 
#> Likelihood ratio test=1.09  on 1 df, p=0.2971
#> n= 7, number of events= 5

## This doesn't work

coxph(Surv(time, 2) ~ x + strata(sex), test1)
#> Error in Surv(time, 2): Time and status are different lengths

## This works

coxph(Surv(time, test1[[2]]) ~ x + strata(sex), test1)
#> Call:
#> coxph(formula = Surv(time, test1[[2]]) ~ x + strata(sex), data = test1)
#> 
#>     coef exp(coef) se(coef)     z     p
#> x 0.8023    2.2307   0.8224 0.976 0.329
#> 
#> Likelihood ratio test=1.09  on 1 df, p=0.2971
#> n= 7, number of events= 5
Created on 2021-09-01 by the reprex package (v2.0.1)

Note that in my example (from the survival documentation), test1 is a list. You may need to use df.model[,i] or convert df.model to a list. Also, should i in Surv() always be 18, as the 18th column contains your event data in every iteration of df.model?

Skaqqs
  • 4,010
  • 1
  • 7
  • 21
  • Thank you @Skaqqs, this makes sense and the error went when I went for `(Surv(time, df.model[[i+17]])` . However, this only works for the first iteration - i.e. when `i <- 1`. For the rest of the iterations, I am now getting this error: `Error in df.model[[i + 17]] : subscript out of bounds` . The error remains even when I convert `df.model` to a list or when I use `df.model[,i+17]`... – md93 Sep 01 '21 at 16:43
  • Oh my bad! I think instead of `df.model[[i + 17]]`, you want `df.model[[18]]`? That is, every `df.model` will have 18 columns, and the 18th column is your event, and your even varies at each iteration of the loop. – Skaqqs Sep 01 '21 at 16:58