Expand ICD10 codes from one row into multiple rows

Question

I have a dataset like this:

ICD_10	diagnosis
A00	Cholera
A01-A03	Other Intestinal infectious diseases
A15	Respiratory tuberculosis
A17-A19	Other tuberculosis

...

On row 2 and 4, there are multiple ICD-10 codes, and I want to expand them into multiple rows, like below:

ICD_10	diagnosis
A00	Cholera
A01	Other Intestinal infectious diseases
A02	Other Intestinal infectious diseases
A03	Other Intestinal infectious diseases
A15	Respiratory tuberculosis
A17	Other tuberculosis
A18	Other tuberculosis
A19	Other tuberculosis

How can I accomplish this in R using tidyverse?

Thanks for your help!

Is it always the case that the letter in a range is the same letter? For instance, one would never see `A01-B05`? — r2evans, Oct 31 '22 at 21:21
@r2evans - if it's ICD-10 coding, I don't think you should see that sort of thing. You do however see instances like H10-H13 only having H10, H11 and H13, no H12. — thelatemail, Oct 31 '22 at 21:29
@thelatemail, it sounds like you and the OP know what `ICD_10` means, it is not obvious what constraints or standards for it may exist. — r2evans, Oct 31 '22 at 21:31
@thelatemail looks like there is a dedicated packages for this job. — zx8754, Oct 31 '22 at 22:23

score 2 · Answer 1 · answered Oct 31 '22 at 22:22

Using dedicated icd package:

#data
d <- structure(list(ICD_10 = c("A00", "A01-A03", "A15", "A17-A19"), diagnosis = c("Cholera", "Other Intestinal infectious diseases", "Respiratory tuberculosis", "Other tuberculosis")), class = "data.frame", row.names = c(NA, -4L))

#remotes::install_github("jackwasey/icd")
library(icd)

To avoid creating non-existent or missing out existing codes between the ranges we use expand_ranges. For example, below returns 33 codes, instead of 3 if we filled in sequentially A01, A02, A03, which is wrong.

expand_range("A01", "A03")
#  [1] "A01"   "A010"  "A0100" "A0101" "A0102" "A0103" "A0104" "A0105"
#  [9] "A0109" "A011"  "A012"  "A013"  "A014"  "A02"   "A020"  "A021" 
# [17] "A022"  "A0220" "A0221" "A0222" "A0223" "A0224" "A0225" "A0229"
# [25] "A028"  "A029"  "A03"   "A030"  "A031"  "A032"  "A033"  "A038" 
# [33] "A039"

We also use explain_code, to give description for newly created codes, example usage:

explain_code("A01")
# [1] "Typhoid and paratyphoid fevers"

Now, wrap two functions into one, to get a pretty output

# custom function using expand_range
f <- function(icd10, diagnosis){
  x <- unlist(strsplit(icd10, "-"))
  
  if(length(x) == 1){ ICD10 = x 
  } else {ICD10 = expand_range(x[1], x[2])}
  
  data.frame(
    icd10 = icd10,
    diagnosis = diagnosis, 
    icd10range = ICD10,
    desc = explain_code(ICD10))
}

And loop through the codes to expand, then rowbind:

# loop through rows, and rowbind
res <- do.call(rbind, 
               mapply(f, d$ICD_10, d$diagnosis,
                      SIMPLIFY = FALSE, USE.NAMES = FALSE))

head(res)
#     icd10                            diagnosis icd10range                                 desc
# 1     A00                              Cholera        A00                              Cholera
# 2 A01-A03 Other Intestinal infectious diseases        A01       Typhoid and paratyphoid fevers
# 3 A01-A03 Other Intestinal infectious diseases       A010                        Typhoid fever
# 4 A01-A03 Other Intestinal infectious diseases      A0100           Typhoid fever, unspecified
# 5 A01-A03 Other Intestinal infectious diseases      A0101                   Typhoid meningitis
# 6 A01-A03 Other Intestinal infectious diseases      A0102 Typhoid fever with heart involvement

As expected A01-A03 now expanded into 33 rows:

table(res$icd10)
# A00 A01-A03     A15 A17-A19 
#   1      33       1      53

score 1 · Accepted Answer · answered Oct 31 '22 at 21:29

fun <- function(vec) {
  ltr <- substring(vec, 1, 1)
  L <- lapply(strsplit(gsub("[^-0-9]", "", vec), "-"), as.integer)
  mapply(function(ltr, z) sprintf("%s%02i", ltr, if (length(z) > 1) seq(z[1], z[2]) else z),
         ltr, L)
}
quux %>%
  mutate(ICD_10 = fun(ICD_10)) %>%
  tidyr::unnest(ICD_10)
# # A tibble: 8 x 2
#   ICD_10 diagnosis                           
#   <chr>  <chr>                               
# 1 A00    Cholera                             
# 2 A01    Other Intestinal infectious diseases
# 3 A02    Other Intestinal infectious diseases
# 4 A03    Other Intestinal infectious diseases
# 5 A15    Respiratory tuberculosis            
# 6 A17    Other tuberculosis                  
# 7 A18    Other tuberculosis                  
# 8 A19    Other tuberculosis

Data

quux <- structure(list(ICD_10 = c("A00", "A01-A03", "A15", "A17-A19"), diagnosis = c("Cholera", "Other Intestinal infectious diseases", "Respiratory tuberculosis", "Other tuberculosis")), class = "data.frame", row.names = c(NA, -4L))

score 0 · Answer 3 · answered Oct 31 '22 at 21:31

0

One option:

tibble::tribble(
  ~ICD_10, ~diagnosis,
  "A00", "Cholera",
  "A01-A03", "Other Intestinal infectious diseases",
  "A15", "Respiratory tuberculosis",
  "A17-A19", "Other tuberculosis"
) |> 
  tidyr::separate_rows(ICD_10, sep = "-") |> 
  mutate(id = parse_number(ICD_10)) |> 
  group_by(diagnosis) |> 
  complete(id = min(id):max(id)) |> 
  mutate(ICD_10 = paste0("A", id))

answered Oct 31 '22 at 21:31

Philippe Massicotte

1,321
12
24

1

To make it use the letter from the code (when not necessarily A), you could replace last line with: `tidyr::fill(ICD_10) %>% mutate(ICD_10 = paste0(substr(ICD_10, 1, 1), id))` – Jon Spring Oct 31 '22 at 21:38

Expand ICD10 codes from one row into multiple rows

3 Answers3