I have a data table like this:
> x
part colig
1: PR PT, PMDB
2: PMDB PT, PMDB
3: PMDB PT, PMDB
4: PDT PT, PMDB
5: PMDB PT, PMDB
6: PFL PSDB,PFL,PTB
7: PPB PSDB,PFL,PTB
8: PMDB PSDB,PFL,PTB
9: PMDB PSDB,PFL,PTB
10: PPB PSDB,PFL,PTB
> str(x)
Classes ‘data.table’ and 'data.frame': 10 obs. of 2 variables:
$ part : chr "PR" "PMDB" "PMDB" "PDT" ...
$ colig:List of 10
..$ : chr "PT" "PMDB"
..$ : chr "PT" "PMDB"
..$ : chr "PT" "PMDB"
..$ : chr "PT" "PMDB"
..$ : chr "PT" "PMDB"
..$ : chr "PSDB" "PFL" "PTB"
..$ : chr "PSDB" "PFL" "PTB"
..$ : chr "PSDB" "PFL" "PTB"
..$ : chr "PSDB" "PFL" "PTB"
..$ : chr "PSDB" "PFL" "PTB"
- attr(*, ".internal.selfref")=<externalptr>
and I want to create a dummy variable that is 1 when the first variable is contained in the second. My desired output is:
> x
part colig dummy
1: PR PT, PMDB FALSE
2: PMDB PT, PMDB TRUE
3: PMDB PT, PMDB TRUE
4: PDT PT, PMDB FALSE
5: PMDB PT, PMDB TRUE
6: PFL PSDB,PFL,PTB TRUE
7: PPB PSDB,PFL,PTB FALSE
8: PMDB PSDB,PFL,PTB FALSE
9: PMDB PSDB,PFL,PTB FALSE
10: PPB PSDB,PFL,PTB FALSE
My problem is accessing the elements inside the list in the second column. I'm trying something like:
x[, dummy := x[,part] %in% x[, colig]]
or
x[, dummy := x[,part] %in% unlist(x[, colig])]
The two options are wrong. In the first case, the dummy is always FALSE, and in the second, the unlist() command creates a list with elements from all the lists (not only from the respective row).
I also tried with lapply (like here Creating dummy variables in R data.table):
x[, dummy := lapply( x[,part], function(y) y %in% unlist(x[,colig]))]
which I think is correct, but I am having problems with speed because I have a lot of rows.
Is there any faster option?