In the comments you mention a lookup table. If this is the case, an approach could be to join both sets together, then use the regex by Wiktor Stribiżew to indicate which are valid
As I'm joining data sets I'm going to use data.table
Method 1: Join everything
library(data.table)
## dummy data, and a lookup table
dt <- data.frame(V1 = c("BCC", "ABB"))
dt_lookup <- data.frame(V1 = c("CBC","BAB", "CCB"))
## convert to data.table
setDT(dt); setDT(dt_lookup)
## add some indexes to keep track of rows from each dt
dt[, idx := .I]
dt_lookup[, l_idx := .I]
## create a column to join on
dt[, key := 1L]
dt_lookup[, key := 1L]
## join EVERYTHING
dt <- dt[
dt_lookup
, on = "key"
, allow.cartesian = T
]
#regex
dt[
, valid := grepl(paste0("^[",i.V1,"]+$"), V1)
, by = 1:nrow(dt)
]
# V1 idx key i.V1 l_idx valid
# 1: BCC 1 1 CBC 1 TRUE
# 2: ABB 2 1 CBC 1 FALSE
# 3: BCC 1 1 BAB 2 FALSE
# 4: ABB 2 1 BAB 2 TRUE
# 5: BCC 1 1 CCB 3 TRUE
# 6: ABB 2 1 CCB 3 FALSE
Method 2: EACHI join
A slightly more memory-efficient approach might be to use this technique by Jaap as it avoids the 'join everything' step, and in stead joins it 'by each i' (row) at a time.
dt_lookup[
dt,
{
valid = grepl(paste0("^[",i.V1,"]+$"), V1)
.(
V1 = V1[valid]
, idx = i.idx
, match = i.V1
, l_idx = l_idx[valid]
)
}
, on = "key"
, by = .EACHI
]
# key V1 idx match l_idx
# 1: 1 CBC 1 BCC 1
# 2: 1 CCB 1 BCC 3
# 3: 1 BAB 2 ABB 2