Extracting new columns in R

Question

I am a newbie with R. I have a dataset with lots of patient-ids in the first column and then 200 columns with Icd10 codes for every patient. I would like to add additional columns for specific ICD10 codes, that state if the patient has a specific icd10 code (1=yes, 0=no). Is there an elegant way to do this?

Thank you very much for your help!

Hi Caro, welcome to Stack Overflow. It will be much easier to help if you provide an accurate stand-in example for your data. Preferably, this would be created in R and provided as an [edit] to your question as the output of the `dput()` function. See [How to make a reproducible example](https://stackoverflow.com/questions/5963269/) for more info. — Ian Campbell, Jun 11 '20 at 22:21
Hi, would you mind showing us what the data set look like? I think you can use mutate function from dplyr package, but it would be better if you can provide a data set so we can provide more help. — Daisy Chang, Jun 11 '20 at 22:25

score 0 · Accepted Answer · answered Jun 11 '20 at 22:50

OK, let's say you have this structure of data:

    patient  dg1  dg2  dg3
1       ID1  K30  E11  I50
2       ID2  I48  I50  E10

Then you can process the data as follows:

# load package "tidyverse"
library(tidyverse)

# read data
data <- read.table(header = TRUE, text = "
      patient  dg1  dg2  dg3
  1       ID1  K30  E11  I50
  2       ID2  I48  I50  E10
")

# 1. select columns with name starting with "dg"
# 2. remove the column names ("name" = c("dg1", "dg2", "dg3") in this case)
# 3. create a dummy variable saying whether particular patient suffers from that disease
# 4. create a wide data frame from the data
dgs <- data %>%
  pivot_longer(starts_with("dg")) %>%
  select(-name) %>%
  mutate(has = as.numeric(!is.na(value))) %>%
  pivot_wider(names_from = value, values_from = has, values_fill = 0)

The result looks like this:

> dgs
#    A tibble: 2 x 6
#    patient   K30   E11   I50   I48   E10
#    <chr>   <dbl> <dbl> <dbl> <dbl> <dbl>
#  1 ID1         1     1     1     0     0
#  2 ID2         0     0     1     1     1

1 for having the disease, 0 for not.

Extracting new columns in R

1 Answers1