0

Detecting exact matches using str_detect() in R wasn't able to provide clear solution for me.

Suppose I have

test <- c("HR", "p-value (stratified)", "HRf", "HR-fake", "p-value", "p-value (unstratified)")
want <- c(TRUE, TRUE, FALSE, FALSE, FALSE, FALSE, FALSE)

The best way would be just to simply

> test == "HR" | test == "p-value (stratified)"
[1]  TRUE  TRUE FALSE FALSE FALSE FALSE

but for the sake of learning, I wish to do it in regex. However, none of these worked for me.

> str_detect(testvec, "HR|p-value (stratified)")
[1]  TRUE FALSE  TRUE  TRUE FALSE FALSE
> str_detect(testvec, "\\bHR\\b|\\bp-value (stratified)\\b")
[1]  TRUE FALSE FALSE  TRUE FALSE FALSE

It seems the problem is that str_detect() is

  1. Detecting "HR-fake" even with "\bHR\b"

    str_detect("HRf","\\bHR\\b") 1 FALSE

    str_detect("HR-fake","\\bHR\\b") 1 TRUE

    str_detect("HR - fake","\\bHR\\b") 1 TRUE

  2. Not detecting "p-value (stratified)" even with "p-value (stratified)"

    str_detect("p-value (stratified)","p-value (stratified)") 1 FALSE

What are causing the issue here? Thank you.

aiorr
  • 547
  • 4
  • 11
  • `length(test) != length(want)`, which is correct? – r2evans Oct 13 '21 at 17:05
  • 1
    I'd recommend first going through some regex tutorials. There are several special characters in regex that you've used in your patterns here, including parentheses and dashes. You also need parentheses around an _or_ group, such as `(HR|p)` – camille Oct 13 '21 at 17:09

1 Answers1

3

In addition to the comment, we need to specify the start (^) and end ($) or else it can match the HR-fake though we can prevent the matching of HRf with word boundary (\\b)

 str_detect(test, regex("^(HR|p-value \\(stratified\\))$"))
[1]  TRUE  TRUE FALSE FALSE FALSE FALSE
akrun
  • 874,273
  • 37
  • 540
  • 662