You can use *apply
family. If your source text is a character vector, I recommend using vapply
, but you have to specify the type and the length of the returned values. Because you use grepl
, the returned values are logical vectors.
txt = "My name is Abdur Rohman"
patt = c("na", "Ab","man", "om")
vapply(patt, function(x) grepl(x,txt),
FUN.VALUE = logical(length(txt)))
# na Ab man om
# TRUE TRUE TRUE FALSE
So, in your example you can use:
s = stri_rand_lipsum(10)
vapply(c("conubia","viverra"), function(x) grepl(x,s),
FUN.VALUE = logical(length(s))
# conubia viverra
# [1,] TRUE TRUE
# [2,] FALSE FALSE
# [3,] TRUE FALSE
# [4,] FALSE FALSE
# [5,] FALSE FALSE
# [6,] FALSE TRUE
# [7,] FALSE FALSE
# [8,] FALSE FALSE
# [9,] FALSE FALSE
#[10,] FALSE FALSE
Edit to include a 140-character window
As for the requirement to create a limiting window with 140-character length, as explained in your comment, one way of meeting the requirement is by extracting all characters between the two targeted strings, and then calculate the number of the extracted characters. The requirement is met only if the number is less than or equal to 140.
Extracting all characters between two strings can be done by regular expressions in gsub
. However,in case the strings are repeated, you need to specify the window. Let me give examples:
txt <- "Lorem conubia amet conubia ipsum dolor sit amet, finibus torquent diam lobortis dolor ac eget viverra dolor viverra"
This text contains two conubia
s and two viverra
s. You have four options to choose the window to specify all characters between conubia and viverra
.
- Option 1: between the last
conubia
and the first viverra
gsub(".*conubia(.*?)viverra.*", "\\1", txt, perl = TRUE)
#[1] " ipsum dolor sit amet, finibus torquent diam lobortis dolor ac eget "
- Option 2: between the first
conubia
and the last viverra
gsub(".*?conubia(.*)viverra.*", "\\1", txt, perl = TRUE)
# [1] " amet conubia ipsum dolor sit amet, finibus torquent diam lobortis dolor ac eget viverra dolor "
- Option 3: between the first
conubia
and the first viverra
gsub(".*?conubia(.*?)viverra.*", "\\1", txt, perl = TRUE)
#[1] " amet conubia ipsum dolor sit amet, finibus torquent diam lobortis dolor ac eget "
- Option 4: between the last
conubia
and the last viverra
gsub(".*conubia(.*)viverra.*", "\\1", txt, perl = TRUE)
#[1] " ipsum dolor sit amet, finibus torquent diam lobortis dolor ac eget viverra dolor "
To calculate the number of the extracted characters, nchar
can be used.
# Option 1
nchar(gsub(".*conubia(.*?)viverra.*", "\\1", txt, perl = TRUE))
#[1] 68
Applying this approach:
set.seed(8)
s1 <- stri_rand_lipsum(10)
Nch <- nchar(gsub(".*conubia(.*?)viverra.*", "\\1", s1, perl = TRUE))
Nch
# [1] 637 42 512 528 595 640 522 407 388 512
we found that the second element of s1
meets the requirement.
To print the element we can use: s1[which(Nch <= 140)]
.
Some great references I've been learning from:
- https://www.buymeacoffee.com/wstribizew/extracting-text-two-strings-regular-expressions
- https://regex101.com/
- Extracting a string between other two strings in R