OK, let's say you have this structure of data:
patient dg1 dg2 dg3
1 ID1 K30 E11 I50
2 ID2 I48 I50 E10
Then you can process the data as follows:
# load package "tidyverse"
library(tidyverse)
# read data
data <- read.table(header = TRUE, text = "
patient dg1 dg2 dg3
1 ID1 K30 E11 I50
2 ID2 I48 I50 E10
")
# 1. select columns with name starting with "dg"
# 2. remove the column names ("name" = c("dg1", "dg2", "dg3") in this case)
# 3. create a dummy variable saying whether particular patient suffers from that disease
# 4. create a wide data frame from the data
dgs <- data %>%
pivot_longer(starts_with("dg")) %>%
select(-name) %>%
mutate(has = as.numeric(!is.na(value))) %>%
pivot_wider(names_from = value, values_from = has, values_fill = 0)
The result looks like this:
> dgs
# A tibble: 2 x 6
# patient K30 E11 I50 I48 E10
# <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 ID1 1 1 1 0 0
# 2 ID2 0 0 1 1 1
1
for having the disease, 0
for not.