Convert cox regression table to forest plot

Question

I want to convert a cox table to forest plot as showed below. Unforunatly I’ve lost my original data (coxph object) so I have to use the data from the table. Data below are just examples:

Desired output:

Reprex for the two tables:

GRP1<-tibble::tribble(
    ~Variable,   ~Level,        ~Number,         ~`HR.(univariable)`,       ~`HR.(multivariable)`,
        "Sex", "Female", "2204 (100.0)",                          NA,                          NA,
           NA,   "Male", "2318 (100.0)", "1.13 (0.91-1.40, p=0.265)", "1.13 (0.91-1.40, p=0.276)",
      "Score",      "1", "2401 (100.0)",                          NA,                          NA,
           NA,    "1-2", "1637 (100.0)", "1.49 (1.19-1.87, p=0.001)", "1.15 (0.90-1.47, p=0.250)",
           NA,    "3-4",  "412 (100.0)", "1.71 (1.14-2.56, p=0.010)", "1.09 (0.71-1.67, p=0.710)",
           NA,    ">=5",   "42 (100.0)", "1.67 (0.53-5.21, p=0.381)", "0.96 (0.30-3.05, p=0.943)",
  "Treatment",      "A", "1572 (100.0)",                          NA,                          NA,
           NA,      "B", "2951 (100.0)", "1.74 (1.26-2.40, p=0.001)", "1.53 (1.09-2.13, p=0.013)"
  )


GRP2<-tibble::tribble(
    ~Variable,   ~Level,        ~Number,          ~`HR.(univariable)`,          ~`HR.(univariable)`,
        "Sex", "Female", "2204 (100.0)",                           NA,                           NA,
           NA,   "Male", "2318 (100.0)",  "1.70 (1.36-2.13, p<0.001)",  "1.62 (1.28-2.04, p<0.001)",
      "Score",      "1", "2401 (100.0)",                           NA,                           NA,
           NA,    "1-2", "1637 (100.0)",  "2.76 (1.21-6.29, p=0.016)",  "2.69 (1.18-6.13, p=0.019)",
           NA,    "3-4",  "412 (100.0)", "5.11 (2.26-11.58, p<0.001)", "4.46 (1.95-10.23, p<0.001)",
           NA,    ">=5",   "42 (100.0)", "5.05 (2.19-11.64, p<0.001)",  "4.08 (1.73-9.59, p=0.001)",
  "Treatment",      "A", "1572 (100.0)",                           NA,                           NA,
           NA,      "B", "2951 (100.0)",  "1.48 (1.16-1.88, p=0.001)",  "1.23 (0.95-1.59, p=0.114)"
  )

Is it doable?

Best regards, H

Yes, it's certainly possible (see my answer here for an example - https://stackoverflow.com/questions/62246541/forest-plot-with-table-ggplot-coding/62312135#62312135 ). However, you may need a hand to do this depending on the format of your data and so on. You haven't posted any actual data here (a picture is no good, unless you expect someone to sit and transcribe your picture into actual data) — Allan Cameron, Oct 30 '21 at 17:47

score 4 · Accepted Answer · answered Nov 01 '21 at 21:41

The difficult thing about this task is not making the plot; it is converting your data from a bunch of text strings into a single long-format data frame that can be used for plotting. This involves using regular expressions to capture the appropriate number for each column, pivoting the result, then repeating that process for the second data frame before binding the two frames together. This is unavoidably ugly and complicated, but that is one of the reasons why having data stored in the correct format is so important.

Anyway, the following code performs the necessary operations:

library(dplyr)

wrangler <- function(data){
   grp <- as.character(match.call()$data)
   data %>%
   tidyr::fill(Variable) %>% 
   mutate(Variable = paste(Variable, Level), 
          Number = as.numeric(gsub("^(\\d+).*$", "\\1", Number)),
          univariable_HR =  as.numeric(gsub("^((\\d+|\\.)+).*$", "\\1", `HR.(univariable)`)),
          univariable_lower = as.numeric(gsub("^.+? \\((.+?)-.*$", "\\1", `HR.(univariable)`)),
          univariable_upper = as.numeric(gsub("^.+?-(.+?),.*$", "\\1", `HR.(univariable)`)),
          univariable_p = gsub("^.+?p=*(.+?)\\).*$", "\\1", `HR.(univariable)`),
          multivariable_HR =  as.numeric(gsub("^((\\d+|\\.)+).*$", "\\1", `HR.(multivariable)`)),
          multivariable_lower = as.numeric(gsub("^.+? \\((.+?)-.*$", "\\1", `HR.(multivariable)`)),
          multivariable_upper = as.numeric(gsub("^.+?-(.+?),.*$", "\\1", `HR.(multivariable)`)),
          multivariable_p = gsub("^.+?p=*(.+?)\\).*$", "\\1", `HR.(multivariable)`),
          group = grp) %>% 
   filter(!is.na(univariable_HR)) %>%
   select(-Level, -`HR.(multivariable)`, - `HR.(univariable)`) %>%
   tidyr::pivot_longer(cols = -(c(1:2, 11)), names_sep = "_", names_to = c("type", ".value"))
}

df <- rbind(wrangler(GRP1), wrangler(GRP2))

This now gives us the data in the correct format for plotting. Each row will become a single pointrange in our plot, so it needs a hazard ratio, a lower confidence bound, an upper confidence bound, a variable label, the type (multivariable versus univariable), and the group it originally came from (GRP1 or GRP2):

df
#> # A tibble: 20 x 8
#>    Variable    Number group type             HR lower upper p     
#>    <chr>        <dbl> <chr> <chr>         <dbl> <dbl> <dbl> <chr> 
#>  1 Sex Male      2318 GRP1  univariable    1.13  0.91  1.4  0.265 
#>  2 Sex Male      2318 GRP1  multivariable  1.13  0.91  1.4  0.276 
#>  3 Score 1-2     1637 GRP1  univariable    1.49  1.19  1.87 0.001 
#>  4 Score 1-2     1637 GRP1  multivariable  1.15  0.9   1.47 0.250 
#>  5 Score 3-4      412 GRP1  univariable    1.71  1.14  2.56 0.010 
#>  6 Score 3-4      412 GRP1  multivariable  1.09  0.71  1.67 0.710 
#>  7 Score >=5       42 GRP1  univariable    1.67  0.53  5.21 0.381 
#>  8 Score >=5       42 GRP1  multivariable  0.96  0.3   3.05 0.943 
#>  9 Treatment B   2951 GRP1  univariable    1.74  1.26  2.4  0.001 
#> 10 Treatment B   2951 GRP1  multivariable  1.53  1.09  2.13 0.013 
#> 11 Sex Male      2318 GRP2  univariable    1.7   1.36  2.13 <0.001
#> 12 Sex Male      2318 GRP2  multivariable  1.62  1.28  2.04 <0.001
#> 13 Score 1-2     1637 GRP2  univariable    2.76  1.21  6.29 0.016 
#> 14 Score 1-2     1637 GRP2  multivariable  2.69  1.18  6.13 0.019 
#> 15 Score 3-4      412 GRP2  univariable    5.11  2.26 11.6  <0.001
#> 16 Score 3-4      412 GRP2  multivariable  4.46  1.95 10.2  <0.001
#> 17 Score >=5       42 GRP2  univariable    5.05  2.19 11.6  <0.001
#> 18 Score >=5       42 GRP2  multivariable  4.08  1.73  9.59 0.001 
#> 19 Treatment B   2951 GRP2  univariable    1.48  1.16  1.88 0.001 
#> 20 Treatment B   2951 GRP2  multivariable  1.23  0.95  1.59 0.114

Now that we have the data in this format, the plot itself is straightforward:

library(ggplot2)

ggplot(df, aes(HR, Variable)) +
   geom_pointrange(aes(xmin = lower, xmax = upper, colour = type),
                   position = position_dodge(width = 0.5)) +
   facet_grid(group~., switch = "y") +
   geom_vline(xintercept = 0, linetype = 2) +
   theme_bw() +
   theme(strip.placement = "outside",
         strip.text= element_text(angle = 180),
         strip.background = element_blank(),
         panel.spacing = unit(0, "mm"))

^{Created on 2021-11-01 by the reprex package (v2.0.0)}

Thanks alot Alan. Any idea how to put the GRP1 and 2 labels on right side together with an secondary Y axis label? (Like Variable) — hklovs, Nov 04 '21 at 17:27
@hkolvs you can change the line `facet_grid(group~., switch = "y") +` to `facet_grid(group~.) +` to move the groups to the right side. Adding a second axis label is a bit harder. You would perhaps need to increase the panel margins and add an annotation to do that. It might be easier if you just renamed GRP1 and GRP2 to something like "Variable: Group 1" and "Variable: Group 2" — Allan Cameron, Nov 04 '21 at 17:38

Convert cox regression table to forest plot

1 Answers1