I have a data.table with ~18^6 rows and I need to take the unique values of CLASS, by ID, and set their respective columns to 1, as seen below in a baby example
DT <- data.table::data.table(ID=c("1","1","1","2","2"),
CLASS=c("a","a","b","c","b"),
a=c(0,0,0,0,0),
b=c(0,0,0,0,0),
c=c(0,0,0,0,0))
### Start with this
ID CLASS a b c
1 a 0 0 0
1 a 0 0 0
1 b 0 0 0
2 c 0 0 0
2 b 0 0 0
### Want this
ID CLASS a b c
1 a 1 1 0
1 a 1 1 0
1 b 1 1 0
2 c 0 1 1
2 b 0 1 1
My first instinct was to try the code below but found that it will set all columns to 1 because unique(DT$CLASS) inherently includes all unique values for all IDs and is not passed through the "grouping" argument parameter so to say.
### Tried this
DT[,unique(DT$CLASS):=1,by=ID]
### Got this
ID CLASS a b c
1 a 1 1 1
1 a 1 1 1
1 b 1 1 1
2 c 1 1 1
2 b 1 1 1
I've been struggling with utilizing data.table to its full potential and speed and would like to create the desired output only using commands within data.table arguments.
Can someone help me write the proper code, using only data.table commands/arguments,so that my jth index includes only the unique values, by ID, and sets the appropriate columns to 1?
Follow-up Question:
Say that each row also has an associated date to it, RXDATE, and I'd want to create respective column names of all class values that holds the minimum RXDATE by CLASS by ID. Could I also turn to dcast for this?
### Start with this
ID CLASS a b c RXDATE
1 a 1 1 0 1-1-99
1 a 1 1 0 1-2-99
1 b 1 1 0 1-3-99
2 c 0 1 1 5-4-00
2 b 0 1 1 6-5-01
### Want this
ID CLASS a b c RXDATE a_DT b_DT c_DT
1 a 1 1 0 1-1-99 1-1-99 1-3-99 NA
1 a 1 1 0 1-2-99 1-1-99 1-3-99 NA
1 b 1 1 0 1-3-99 1-1-99 1-3-99 NA
2 c 0 1 1 5-4-00 NA 6-5-01 5-4-00
2 b 0 1 1 6-5-01 NA 6-5-01 5-4-00