0

If I train a cox model using resampling with 5-fold cross validation in mlr, the value for Concordance that is output by printing the summary of the Cox model for each fold is different from the value for cindex that is calculated by mlr. Am I interpreting this incorrectly? Or am I using too many predictors? If so why would that cause this discrepancy?

In the example below, mlr returns a cindex value of 0.5093809 for the first fold, but the cox summary output reports a Concordance of 0.76. My data can be downloaded here: https://www.dropbox.com/s/nt9s3p1rdafq465/test_data.csv?dl=0

Resampling:

library(survival)
library(mlr)

mydata <- read.csv(file="test_data.csv", header=TRUE, sep=",",row.names=NULL)    
surv.task <- makeSurvTask(data = mydata, target = c("timeToEvent", "status"))
rdesc <- makeResampleDesc(method="CV", iters=5, stratify=TRUE)
r = resample("surv.coxph", surv.task, rdesc, models=TRUE)
r

Resample Result
Task: mydata
Learner: surv.coxph
Aggr perf: cindex.test.mean=0.5999838
Runtime: 0.151174

r$measures.test
 iter    cindex
1    1 0.5093809
2    2 0.7324649
3    3 0.4984653
4    4 0.6461876
5    5 0.6134201

Check the summary of the Cox model for the first fold:

summary(getLearnerModel(r$models[[1]]))

Call:
survival::coxph(formula = f, data = data)

  n= 698, number of events= 65 

          coef  exp(coef)   se(coef)      z Pr(>|z|)    
V1  -0.1225832  0.8846323  0.1833418 -0.669 0.503748    
V2  -1.9815012  0.1378621  2.9565667 -0.670 0.502728    
V3  -0.5894775  0.5546170  1.9276623 -0.306 0.759758    
V4   0.5005582  1.6496418  0.9433060  0.531 0.595667    
V5   0.0179647  1.0181271  1.9273040  0.009 0.992563    
V6   0.7309210  2.0769926  1.9361340  0.378 0.705790    
V7  -0.0012070  0.9987937  0.0890533 -0.014 0.989186    
V8   0.1029020  1.1083828  0.0356533  2.886 0.003899 ** 
V9  -0.2728561  0.7612023  0.2311420 -1.180 0.237813    
V10 -0.0213663  0.9788604  0.0133210 -1.604 0.108725    
V11  0.2416705  1.2733746  0.2113099  1.144 0.252757    
V12 -0.0021392  0.9978631  0.0550684 -0.039 0.969014    
V13 -0.0047373  0.9952739  0.0073776 -0.642 0.520794    
V14  0.0119084  1.0119796  0.0036098  3.299 0.000971 ***
V15 -6.6529859  0.0012902  2.8566451 -2.329 0.019862 *  
V16 -0.0005712  0.9994290  0.0015808 -0.361 0.717842    
V17 -0.0058360  0.9941810  0.0970749 -0.060 0.952062    
V18 -0.0095129  0.9905322  0.0072980 -1.304 0.192402    
V19  0.0004149  1.0004150  0.0002001  2.074 0.038107 *  
V20  0.0001584  1.0001584  0.0002319  0.683 0.494487    
V21 -0.0010930  0.9989076  0.0045039 -0.243 0.808255    
V22 -0.0015312  0.9984700  0.0023389 -0.655 0.512699    
V23 -0.0441918  0.9567705  0.0936314 -0.472 0.636944    
V24  0.0475120  1.0486588  0.0681332  0.697 0.485590    
V25  0.1637753  1.1779496  0.1177553  1.391 0.164283    
V26 -0.0296841  0.9707521  0.0460953 -0.644 0.519593    
V27 -0.1181631  0.8885511  0.0824113 -1.434 0.151623    
V28  0.0081237  1.0081568  0.0106226  0.765 0.444419    
V29 -0.0409860  0.9598425  0.0282858 -1.449 0.147339    
V30  0.0006100  1.0006102  0.0002408  2.533 0.011293 *  
V31 -0.0016426  0.9983587  0.0054629 -0.301 0.763655    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

    exp(coef) exp(-coef) lower .95 upper .95
V1    0.88463     1.1304 6.176e-01    1.2671
V2    0.13786     7.2536 4.196e-04   45.2980
V3    0.55462     1.8030 1.268e-02   24.2562
V4    1.64964     0.6062 2.597e-01   10.4793
V5    1.01813     0.9822 2.330e-02   44.4965
V6    2.07699     0.4815 4.671e-02   92.3581
V7    0.99879     1.0012 8.388e-01    1.1893
V8    1.10838     0.9022 1.034e+00    1.1886
V9    0.76120     1.3137 4.839e-01    1.1974
V10   0.97886     1.0216 9.536e-01    1.0048
V11   1.27337     0.7853 8.416e-01    1.9267
V12   0.99786     1.0021 8.958e-01    1.1116
V13   0.99527     1.0047 9.810e-01    1.0098
V14   1.01198     0.9882 1.005e+00    1.0192
V15   0.00129   775.0952 4.776e-06    0.3485
V16   0.99943     1.0006 9.963e-01    1.0025
V17   0.99418     1.0059 8.219e-01    1.2025
V18   0.99053     1.0096 9.765e-01    1.0048
V19   1.00041     0.9996 1.000e+00    1.0008
V20   1.00016     0.9998 9.997e-01    1.0006
V21   0.99891     1.0011 9.901e-01    1.0078
V22   0.99847     1.0015 9.939e-01    1.0031
V23   0.95677     1.0452 7.964e-01    1.1495
V24   1.04866     0.9536 9.176e-01    1.1985
V25   1.17795     0.8489 9.352e-01    1.4837
V26   0.97075     1.0301 8.869e-01    1.0625
V27   0.88855     1.1254 7.560e-01    1.0443
V28   1.00816     0.9919 9.874e-01    1.0294
V29   0.95984     1.0418 9.081e-01    1.0146
V30   1.00061     0.9994 1.000e+00    1.0011
V31   0.99836     1.0016 9.877e-01    1.0091

Concordance= 0.76  (se = 0.037 )
Rsquare= 0.087   (max possible= 0.68 )
Likelihood ratio test= 63.69  on 31 df,   p=5e-04
Wald test            = 67.74  on 31 df,   p=2e-04
Score (logrank) test = 70.07  on 31 df,   p=7e-05
panda
  • 821
  • 1
  • 9
  • 20
  • Since `surv.task` is not defined in your script, it is incredibly difficult to reproduce the error and help answer your problem. Try `dput`-ing your data, and check out this resource on how to produce a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) – astrofunkswag Nov 30 '18 at 00:02
  • Sorry. I have edited the code and also added a download link to the data. – panda Nov 30 '18 at 01:09

1 Answers1

2

The cox model's concordance index is calculated with the training data, mlr calculates it with the out-of-sample data of each fold. That is the difference and unsurprisingly out-of-sample it is much worse. ;)

PhilippPro
  • 659
  • 4
  • 12