-3

I am a newbie with R. I have a dataset with lots of patient-ids in the first column and then 200 columns with Icd10 codes for every patient. I would like to add additional columns for specific ICD10 codes, that state if the patient has a specific icd10 code (1=yes, 0=no). Is there an elegant way to do this?

Thank you very much for your help!

Caro
  • 1
  • 1
  • 3
    Hi Caro, welcome to Stack Overflow. It will be much easier to help if you provide an accurate stand-in example for your data. Preferably, this would be created in R and provided as an [edit] to your question as the output of the `dput()` function. See [How to make a reproducible example](https://stackoverflow.com/questions/5963269/) for more info. – Ian Campbell Jun 11 '20 at 22:21
  • Hi, would you mind showing us what the data set look like? I think you can use mutate function from dplyr package, but it would be better if you can provide a data set so we can provide more help. – Daisy Chang Jun 11 '20 at 22:25

1 Answers1

0

OK, let's say you have this structure of data:

    patient  dg1  dg2  dg3
1       ID1  K30  E11  I50
2       ID2  I48  I50  E10

Then you can process the data as follows:

# load package "tidyverse"
library(tidyverse)

# read data
data <- read.table(header = TRUE, text = "
      patient  dg1  dg2  dg3
  1       ID1  K30  E11  I50
  2       ID2  I48  I50  E10
")

# 1. select columns with name starting with "dg"
# 2. remove the column names ("name" = c("dg1", "dg2", "dg3") in this case)
# 3. create a dummy variable saying whether particular patient suffers from that disease
# 4. create a wide data frame from the data
dgs <- data %>%
  pivot_longer(starts_with("dg")) %>%
  select(-name) %>%
  mutate(has = as.numeric(!is.na(value))) %>%
  pivot_wider(names_from = value, values_from = has, values_fill = 0)

The result looks like this:

> dgs
#    A tibble: 2 x 6
#    patient   K30   E11   I50   I48   E10
#    <chr>   <dbl> <dbl> <dbl> <dbl> <dbl>
#  1 ID1         1     1     1     0     0
#  2 ID2         0     0     1     1     1

1 for having the disease, 0 for not.