I have the same question answered here R - Find all vector elements that contain all strings / patterns - str_detect grep. But the suggested solution is taking too long.
I have 73,360 observations with sentences. I want a TRUE return for matches that contain ALL search strings.
sentences <- c("blue green red",
"blue green yellow",
"green red yellow ")
search_terms <- c("blue","red")
pattern <- paste0("(?=.*", search_terms,")", collapse="")
grepl(pattern, sentences, perl = TRUE)
-output
[1] TRUE FALSE FALSE
This gives the right result, but it takes a very very very long time. Is there a faster way? I tried str_detect
and got same delayed result.
BTW the "sentences" contain special characters like [],.-
but no special characters like ñ
.
UPDATED: below are my bemchmark results using the suggested methods, thanks to @onyambu's input.
Unit: milliseconds
expr min lq mean median uq max neval
OP_solution() 7033.7550 7152.0689 7277.8248 7251.8419 7391.8664 7690.964 100
map_str_detect() 2239.8715 2292.1271 2357.7432 2348.9975 2397.1758 2774.349 100
unlist_lapply_fixed() 308.1492 331.9948 345.6262 339.9935 348.9907 586.169 100
Reduce_lapply winnnnssss! Thanks @onyambu
Unit: milliseconds
expr min lq mean median uq max neval
Reduce_lapply() 49.02941 53.61291 55.96418 55.31494 56.76109 80.64735 100
unlist_lapply_fixed() 318.25518 335.58883 362.03831 346.71509 357.97142 566.95738 100