-2

I have a data frame like below

enter image description here

I would like to convert the data frame as below. All the codes are in code column are categorical variable, each of the categorical variable need to come as a separate column and need one-hot encoding. challenge here is unit of encoding need to be at member level. For example, Patient1 summary shows from the picture below, A123 coded as 1, B123 coded as 1, C123 coded as 1 and D123 coded as 0 and score is an average of the scores for patient 1.

enter image description here

  • Welcome to StackOverflow! Please read the info about [how to ask a good question](http://stackoverflow.com/help/how-to-ask) and how to give a [reproducible example](http://stackoverflow.com/questions/5963269). This will make it much easier for others to help you. – Sotos Jan 12 '22 at 12:49

1 Answers1

0
dt <- data.frame(
  member = c("patient1", "patient1", "patient1", "patient2", "patient3", "patient3"),
  code = c("A123", "B123", "C123", "A123", "B123", "D123"),
  score = c(3, 2, 1, 3, 2, 5)
)

library(data.table)
setDT(dt)

cols <- levels(dt$code)

out <- dcast(dt, member ~ code, value.var = "score")
out[, score := rowMeans(.SD, na.rm = T), by = member]
out[, (cols) := lapply(.SD, function(x) { +!is.na(x) }), .SDcols = cols]

out

     member A123 B123 C123 D123 score
1: patient1    1    1    1    0   2.0
2: patient2    1    0    0    0   3.0
3: patient3    0    1    0    1   3.5
Merijn van Tilborg
  • 5,452
  • 1
  • 7
  • 22