I have a dataset stored as a data.table DT
that looks like this:
print(DT)
category industry
1: administration admin
2: nurse practitioner truck
3: trucking truck
4: administration admin
5: warehousing nurse
6: warehousing admin
7: trucking truck
8: nurse practitioner nurse
9: nurse practitioner truck
I would like to reduce the table to only rows where the industry matches the category. My general approach is to use grepl()
to regex match the string '^{{INDUSTRY}}[a-z ]+$'
and each row of DT$category
, with each corresponding row of DT$industry
inserted in place of {{INDUSTRY}}
in the regex string using infuse()
. I struggled to find a sleek data.table solution that would properly loop through the table and make within-row comparisons, so I resorted to a for-loop to get the job done:
template <- "^{{IND}}[a-z ]+$"
DT[,match := FALSE,]
for (i in seq(1,length(DT$category))) {
ind <- DT[i]$industry
categ <- d.daily[i]$category
if (grepl(infuse(IND=ind,template),categ)){
DT[i]$match <- TRUE
}
}
DT<- DT[match==TRUE]
print(DT)
category industry
1: administration admin
2: trucking truck
3: administration admin
4: trucking truck
5: nurse practitioner nurse
However, I am sure this can be done in a better way. Any suggestions for how I could achieve this result by utilizing the data.table package's functionality? It's my understanding that, in this context, an approach that uses the package would likely be more efficient than a for-loop.