58

I want to find multiple strings and put it in a variable, however I keep getting errors.

queries <- httpdf %>% filter(str_detect(payload, "create" || "drop" || "select"))
Error: invalid 'x' type in 'x || y'

queries <- httpdf %>% filter(str_detect(payload, "create" | "drop" | "select"))
Error: operations are possible only for numeric, logical or complex types

queries1 <- httpdf %>% filter(str_detect(payload, "create", "drop", "select"))
Error: unused arguments ("drop", "select")

None of these worked. Is there another way to do it with str_detect or should i try something else? I want them to show up as in the same column as well.

Karolis Koncevičius
  • 9,417
  • 9
  • 56
  • 89
Magick.M
  • 615
  • 1
  • 6
  • 6

3 Answers3

88

An even simpler way, in my opinion, for your quite short list of strings you want to find can be:

queries <- httpdf %>% filter(str_detect(payload, "create|drop|select"))

As this is actually what

[...] paste(c("create", "drop", "select"),collapse = '|')) [...]

does, as recommended by @penguin before.

For a longer list of strings you want to detect I would first store the single strings into a vector and then use @penguin's approach, e.g.:

strings <- c("string1", "string2", "string3", "string4", "string5", "string6")
queries <- httpdf %>% 
  filter(str_detect(payload, paste(strings, collapse = "|")))

This has the advantage that you can easily use the vector strings later on as well if you want to or have to.

fabilous
  • 989
  • 9
  • 9
45

This is a way to solve this problem:

queries1 <- httpdf %>% 
  filter(str_detect(payload, paste(c("create", "drop", "select"),collapse = '|')))
Marcus Nunes
  • 851
  • 1
  • 18
  • 33
penguin
  • 1,267
  • 14
  • 27
  • 1
    With this example I'm getting "creator" (from "the creator is nice") because of "creat", how do I match only the exact word? – RxT Jul 10 '20 at 13:53
  • Just a heads up that you need to escape reserved regex characters in your strings, for instance replace "." with "\\.", etc. – user2363777 Apr 26 '23 at 14:07
0

I suggest to use loops for such operations. It is much more versatile, IMHO.

An example httpdf table (also to answer the comment of RxT):

httpdf <- tibble(
  payload = c(
    "the createor is nice",
    "try to create something to select",
    "never catch a dropping knife",
    "drop it like it's hot",
    NA,
    "totaly unrelated" ),
  other_optional_columns = 1:6 )

I use sapply to loop over the search query and apply each string as an individual pattern to str_detect. This returns a matrix with one column per search query sting and one line per subject string, which can be collapsed to return a logical vector of your desire.

queries1 <-
  httpdf[ 
    sapply(
      c("create", "drop", "select"),
      str_detect,
      string = httpdf$payload ) %>%
    rowSums( na.rm = TRUE ) != 0, ]

And of course it can be wrapped in a function to use inside a tidyverse filter:

## function
str_detect_mult <-
  function( subject, query ) {
    sapply(
      query,
      str_detect,
      string = subject ) %>%
    rowSums( na.rm = TRUE ) != 0
}
## tidy code
queries1 <- httpdf %>% filter( str_detect_mult( payload, c("create", "drop", "select") ) )

Easily handle word boarders if you want exact word matches (the "\\b" matches a word border and is joined to the start and end of the string):

str_detect_mult_exact <-
  function( subject, query ) {
    sapply(
      query,
      function(.x)
        str_detect(
          subject,
          str_c("\\b",.x,"\\b") ) ) %>%
    rowSums( na.rm = TRUE ) != 0
}

Easily handle multiple matches (e.g. if you want only lines matching exactly one of the strings, i.e. XOR):

str_detect_mult_xor <-
  function( subject, query ) {
    sapply(
      query,
      str_detect,
      string = subject ) %>%
    rowSums( na.rm = TRUE ) == 1
}

Also works in base R:

## function
str_detect_mult <-
  function( subject, query ) {
    rowSums(sapply(
      query,
      grepl,
      x = subject ), na.rm = TRUE ) != 0
}
## tidy code
queries1 <- httpdf[ str_detect_mult( httpdf$payload, c("create", "drop", "select") ), ]