2

I need to create dummy variables using ICD-10 codes. For example, chapter 2 starts with C00 and ends with D48X. Data looks like this:

data <- data.frame(LINHAA1 = c("B342", "C000", "D450", "0985"),
                   LINHAA2 = c("U071", "C99", "D68X", "J061"),
                   LINHAA3 = c("D48X", "Y098", "X223", "D640"))

Then I need to create a column that receives 1 if it's between the C00-D48X range and 0 if it's not. The result I desire:

LINHAA1   LINHAA2   LINHAA3  CHAPTER2
B342      U071      D48X         1
C000      C99       Y098         1
D450      D68X      X223         1
O985      J061      D640         0

It needs to go through LINHAA1 to LINHAA3. Thanks in advance!

zx8754
  • 52,746
  • 12
  • 114
  • 209
Lana Meijinhos
  • 347
  • 1
  • 10
  • Are you sure the data.frame should look like this? You called the variables "LinhaX", which is portuguese for "LineX" yet, as the toy data.frame was created, those are actually columns/variables – GuedesBF Jan 29 '23 at 19:27
  • To create The column CHAPTER2, which other variable/lines are you checking against the target interval? – GuedesBF Jan 29 '23 at 19:29
  • 1
    @GuedesBF "Linha" is the name of the column/variable because I'm working with death certificates, and they have Linha A, Linha B, Linha C and Linha D. When it becomes a database, they turn into columns. – Lana Meijinhos Jan 29 '23 at 19:31
  • @GuedesBF to create CHAPTER2 I need to check all LINHA columns from each row. – Lana Meijinhos Jan 29 '23 at 19:33
  • Do you mean if _any one of_ LINHAA1, LINHAA2, or LINHAA3 is in the range the column gets a 1? – G5W Jan 29 '23 at 19:33
  • @GuedesBF Exactly! – Lana Meijinhos Jan 29 '23 at 19:38
  • Ok, @LanaMeijinhos, acho que entendi, veja se a resposta ajuda – GuedesBF Jan 29 '23 at 20:05

3 Answers3

3

This should do it:

as.numeric(apply(apply(data, 1, 
    function(x) { x >="C00" & x <= "D48X" }), 2, any))
[1] 1 1 1 0

A little explanation: Checking if the codes are in the range can just be checked using alphabetic order (which you can get from <= etc). The inner apply checks each element and produces a matrix of logical values. The outer apply uses any to check if any one of the three logical values is true. as.numeric changes the result from TRUE/False to 1/0.

G5W
  • 36,531
  • 10
  • 47
  • 80
2

This is the typical case for dplyr::if_any. if_any returns TRUE if a given condition is met in any of the tested columns, rowwise:

library(dplyr)

data %>%
    mutate(CHAPTER2 = +if_any(starts_with("LINHAA"),
                             ~.x >= 'C00' & .x <='D48X'))

  LINHAA1 LINHAA2 LINHAA3 CHAPTER2
1    B342    U071    D48X        1
2    C000     C99    Y098        1
3    D450    D68X    X223        1
4    0985    J061    D640        0
GuedesBF
  • 8,409
  • 5
  • 19
  • 37
1

Using dedicated icd package

# remotes::install_github("jackwasey/icd")
library(icd)

#get the 2nd chapter start and end codes
ch2 <- icd::icd10_chapters[[ 2 ]]
# start   end 
# "C00" "D49" 

#expland the codes to include all chapter2 codes
ch2codes <- expand_range(ch2[ "start" ], ch2[ "end" ])
# length(ch2codes)
# 2094

#check if codes in a row match
ix <- apply(data, 1, function(i) any(i %in% ch2codes))
# [1] FALSE  TRUE FALSE FALSE

data$chapter2 <- as.integer(ix)
#data
#   LINHAA1 LINHAA2 LINHAA3 chapter2
# 1    B342    U071    D48X        0
# 2    C000     C99    Y098        1
# 3    D450    D68X    X223        0
# 4    0985    J061    D640        0

Note that you have some invalid codes:

#invalid
is_defined("D48X")
# [1] FALSE
explain_code("D48X")
# character(0)

#Valid
is_defined("D48")
# [1] TRUE
explain_code("D48")
# [1] "Neoplasm of uncertain behavior of other and unspecified sites"
zx8754
  • 52,746
  • 12
  • 114
  • 209