Regression through different data frames

Question

I have several dataframes (with different names) like those below, with the same number of rows and columns but different names for the last column.

df1:

ID  matching_variable   STATUS  code_1
1   1   1   1
2   1   0   1
3   2   1   0
4   2   1   0

df2:

ID  matching_variable   STATUS  code_2
1   1   1   1
2   1   0   0
3   2   1   0
4   2   1   1

I have about a dozen df's like this and I would like to do a logistic regression of this style for each df:

fit1<-clogit(STATUS~code_1+strata(matching_variable),data=df1)
fit2<-clogit(STATUS~code_2+strata(matching_variable),data=df2)

etc….

I would like to make a function to "automate" this (without having to write all the regressions) and have all the outputs of the regressions in a new table.

I thought of using something like this function: (but as I have different names for the df and for the last column, I get stuck...)

list<-list(df1,df2)

results<- lapply(list, function(x) {clogit(STATUS ~ code_??? + strata(matching_variable), data=???, l)})

Thank you in advance.

score 1 · Answer 1 · answered Apr 06 '22 at 07:04

1

Make a custom function that finds last column and uses it in clogit as formula, something like below, not tested:

myClogit <- function(d){
  lastColName <- tail(colnames(d), 1)
  f <- as.formula(
    paste("STATUS ~", lastColName, "+ strata(matching_variable)"))
  clogit(f, data = d)
  }

Then make a list of dataframes and loop:

lapply(list(df1, df2), myClogit)

answered Apr 06 '22 at 07:04

zx8754

52,746
12
114
209

Thank you this solution also works, but I don't have the confidence intervals either, do you have how I can get them please? – lM__3 Apr 06 '22 at 07:46
@lG__3 post this as a new question - "How to get confidence intervals from clogit". But first research if it was asked already. – zx8754 Apr 06 '22 at 07:51
I finally found it, thanks for your help! – lM__3 Apr 06 '22 at 07:54

score 1 · Accepted Answer · answered Apr 06 '22 at 07:23

1

Another possible solution, based on purrr::map:

library(purrr)
library(survival)

map(list(df1, df2), ~ clogit(STATUS ~ .x[,4] + strata(matching_variable), data=.x)) 

#> Warning in coxexact.fit(X, Y, istrat, offset, init, control, weights =
#> weights, : Ran out of iterations and did not converge
#> [[1]]
#> Call:
#> clogit(STATUS ~ .x[, 4] + strata(matching_variable), data = .x)
#> 
#>         coef exp(coef) se(coef)  z  p
#> .x[, 4]   NA        NA        0 NA NA
#> 
#> Likelihood ratio test=0  on 0 df, p=1
#> n= 4, number of events= 3 
#> 
#> [[2]]
#> Call:
#> clogit(STATUS ~ .x[, 4] + strata(matching_variable), data = .x)
#> 
#>              coef exp(coef)  se(coef)     z     p
#> .x[, 4] 2.020e+01 5.943e+08 2.438e+04 0.001 0.999
#> 
#> Likelihood ratio test=1.39  on 1 df, p=0.239
#> n= 4, number of events= 3

answered Apr 06 '22 at 07:23

PaulS

21,159
2
9
26

1

Thank you very much, it seems to work! Do you know how I can get the confidence interval please? – lM__3 Apr 06 '22 at 07:37
I am not familiar with `clogit`. So, if you tell me how to calculate the confidence interval for a single regression, I will adapt my code to do that for all regressions. – PaulS Apr 06 '22 at 07:47
I found it by adapting the code a bit the summary function gives the interval without doing any calculation: `results<-map(list(df1, df2), ~ clogit(STATUS ~ .x[,4]+strata(matching_variable), data=.x))` `lapply(results,summary)` – lM__3 Apr 06 '22 at 07:52
1

You can shorten the code: `map(list(df1, df2), ~ summary(clogit(STATUS ~ .x[,4] + strata(matching_variable), data=.x)))`. There is no need for your `lapply`! – PaulS Apr 06 '22 at 07:58

Regression through different data frames

2 Answers2