How to reproduce Cox survival model results from Stata in R?

Question

I am attempting to reproduce in R Cox survival model results originally obtained in Stata. Here is the Stata code:

stset t, id(leadid) failure(c_coup)

stcox legislature  lgdp_1 growth_1 exportersoffuelsmainlyoil_EL2008 ethfrac_FIXED communist mil cw age

Here is the code that I wrote do reproduce this in R:

# Load survival package
library(survival)

# Set the survival object
surv_obj <- Surv(data$t, data$c_coup)

# Run model
m1 <- coxph(surv_obj ~ legislature + lgdp_1 + growth_1 + exportersoffuelsmainlyoil_EL2008 + ethfrac_FIXED + communist + mil + cw + age, data = data, method = "breslow")

# Examine hazard ratios
exp(coef(m1))

For whatever reason, I am unable to obtain the same results. For example, the estimate for legislature in the Stata results (the original results I want to produce) is 0.298. In R, it is 0.1688371. Any advice would be appreciated.

Here is dataset preview for reproduction:

structure(list(t = structure(c(1, 2, 3, 4, 5, 6), label = "Current time in office", format.stata = "%9.0g"), 
    c_coup = structure(c(0, 0, 0, 0, 0, 0), format.stata = "%9.0g"), 
    leadid = structure(c("A2.2-208", "A2.2-208", "A2.2-208", 
    "A2.2-208", "A2.2-208", "A2.2-208"), label = "Leader ID", format.stata = "%13s"), 
    legislature = structure(c(1, 1, 1, 1, 1, 1), format.stata = "%9.0g"), 
    lgdp_1 = structure(c(7.68524360656738, 7.69938945770264, 
    7.54960918426514, 7.57916784286499, 7.6033992767334, 7.67089462280273
    ), format.stata = "%9.0g"), growth_1 = structure(c(6.35386085510254, 
    1.42463231086731, -13.910285949707, 3, 2.45273375511169, 
    6.98254346847534), label = "annual growth, t-1, Maddison", format.stata = "%9.0g"), 
    exportersoffuelsmainlyoil_EL2008 = structure(c(0, 0, 0, 0, 
    0, 0), format.stata = "%8.0g"), ethfrac_FIXED = structure(c(NA_real_, 
    NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), label = "eth. frac", format.stata = "%8.0g"), 
    communist = structure(c(0, 0, 0, 0, 0, 0), label = "Communist Leader", format.stata = "%8.0g"), 
    mil = structure(c(1, 1, 1, 1, 1, 1), format.stata = "%9.0g"), 
    cw = structure(c(1, 1, 1, 1, 1, 1), format.stata = "%9.0g"), 
    age = structure(c(52, 53, 54, 55, 56, 57), label = "Current age", format.stata = "%9.0g")), row.names = c(NA, 
-6L), class = c("tbl_df", "tbl", "data.frame"))

langtang · Accepted Answer · 2023-05-12T23:38:12.477

1

You can try setting a t0 for each row:

library(dplyr)
library(foreign)

data = read_stata("leaders.dta")
data = mutate(data,t0 = lag(t,default=0), .by=leadid)

survobj = Surv(data[["t0"]], data[["_t"]], data$c_coup)
coxph(survobj~legislature +  lgdp_1+ growth_1 +exportersoffuelsmainlyoil_EL2008+ ethfrac_FIXED+ communist+ mil+ cw+ age, 
      data=data, ties="breslow")

Output:

Call:
coxph(formula = survobj ~ legislature + lgdp_1 + growth_1 + exportersoffuelsmainlyoil_EL2008 + 
    ethfrac_FIXED + communist + mil + cw + age, data = data, 
    ties = "breslow")

                                      coef exp(coef)  se(coef)      z        p
legislature                      -1.212411  0.297479  0.226754 -5.347 8.95e-08
lgdp_1                           -0.337226  0.713748  0.146391 -2.304  0.02125
growth_1                          0.009499  1.009544  0.016494  0.576  0.56470
exportersoffuelsmainlyoil_EL2008  0.470004  1.600001  0.304949  1.541  0.12325
ethfrac_FIXED                    -0.005648  0.994368  0.003489 -1.619  0.10552
communist                        -1.800431  0.165228  1.008908 -1.785  0.07434
mil                               1.157206  3.181034  0.265029  4.366 1.26e-05
cw                                0.886741  2.427206  0.342806  2.587  0.00969
age                               0.027398  1.027776  0.010398  2.635  0.00842

Likelihood ratio test=103.8  on 9 df, p=< 2.2e-16
n= 2903, number of events= 116 
   (2751 observations deleted due to missingness)

edited May 12 '23 at 23:38

answered May 12 '23 at 22:52

langtang

22,248
1
12
27

Thanks. This brings me pretty close to the results, although they're still a little off. Could it be that I will never be able to obtain exactly the same results in R? – w5698 May 12 '23 at 23:09
maybe if you shared a bit more of your example data I could be of more assistance.. You should be able to replicate exactly. Do you by chance have any rows in your `data` where the **first row** in the `leadid` group has `c_coup` = 1? – langtang May 12 '23 at 23:10
Thanks. Yes, I believe I do. Here is a link to the full dataset: https://drive.google.com/file/d/1JEO3jzcomsA1Tp8J7O5x-X5Q08UOkG1N/view?usp=share_link – w5698 May 12 '23 at 23:15
These are iterative maximum likelihood procedures; under the hood stata and R may be using slighly different parameters/controls settings – langtang May 12 '23 at 23:17
i'm getting the exact same results. What difference are you seeing? – langtang May 12 '23 at 23:30
Can you provide the code you used? Is it different from mine? You are getting (0.298) for the estimate on legislature when you do this in R? – w5698 May 12 '23 at 23:34
I'm getting `0.297479` in R and `.2974792` in Stata – langtang May 12 '23 at 23:37
It will not work on my PC. However, I just logged onto the RStudio server on my university's computing cluster and got it to work. I wonder why it won't work on my own computer! – w5698 May 12 '23 at 23:47
perhaps check versions of packages and see if these are somehow different between machines.glad you got it to work – langtang May 12 '23 at 23:49
any chance you were getting `0.3348482` in R? – langtang May 12 '23 at 23:52
Yes, I was getting that! – w5698 May 13 '23 at 00:29
you were creating the t0 overall, rather than separately by `leadid`. Perhaps you forgot `.by=leadid` in the `mutate(...)` call. On an older version of dplyr, you would not have been able to use `reframe` and would isntead have done `data=group_by(data, leadid) %>% mutate(t0 = lag(t, default=0))`. Either way, grouping by `leadid` is essential. – langtang May 13 '23 at 00:33

How to reproduce Cox survival model results from Stata in R?

1 Answers1