2

I have a string in R:

c("FLT1", "FLT1P1", "FLT1-FLT2", "SGY-FLT1, GPD")

I want to keep all matches that have FLT1 but not when other alphanumeric characters are added. In other words, I want to keep all entries except the second one, as all of them mention FLT1, but the second one mentions FLT1P1.

When I use str_detect, it returns everything as true:

str_detect(string, "FLT1")
[1] TRUE TRUE TRUE TRUE

Can anyone advise on the best method to only return the items that mention FLT1?

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
icedcoffee
  • 935
  • 1
  • 6
  • 18
  • you can anchor your regex – Bruno Jun 17 '20 at 13:45
  • 3
    Don't use regex for full-string exact matches. `string == "FLT1"` – Gregor Thomas Jun 17 '20 at 13:45
  • If I use string == "FLT1" it misses out entries such as "FLT1-FLT2" – icedcoffee Jun 17 '20 at 13:47
  • Okay, so when you way *"but not when other characters are added"*, it seems you don't include `"-"` in your definition of character. Do you mean "but not when other letters" are added? "...not other letters and numbers"? "Anything but "-"? Something else? – Gregor Thomas Jun 17 '20 at 13:48
  • you can try `str_detect(string, "FLT1(\\W|$)")`, specifying no alphanum characters (i.e. a character that is not alphanumeric or the end of string) after FLT1 – Cath Jun 17 '20 at 13:50

4 Answers4

8

Probably word boundaries with \\b will work. They match the beginning or end of strings and the transition to/from any character that is not a number, letter, or underscore.

str_detect(string, "\\bFLT1\\b")
[1]  TRUE FALSE  TRUE  TRUE
Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294
1

Use look arounds

library(stringr)

x <- c("FLT1", "FLT1P1", "FLT1-FLT2", "SGY-FLT1, GPD","AFLT1")

x %>% 
  str_detect("(?<![:alpha:])FLT1(?![:alpha:])")
#> [1]  TRUE FALSE  TRUE  TRUE FALSE

Created on 2020-06-17 by the reprex package (v0.3.0)

Bruno
  • 4,109
  • 1
  • 9
  • 27
1

"No other characters added" means to me word boundary which is expressed by \\b.

x <- c("FLT1", "FLT1P1", "FLT1-FLT2", "SGY-FLT1, GPD")
stringr::str_detect(x, "FLT1\\b")
# [1]  TRUE FALSE  TRUE  TRUE

Or base R:

grepl("FLT1\\b", x)
# [1]  TRUE FALSE  TRUE  TRUE
jay.sf
  • 60,139
  • 8
  • 53
  • 110
1

The best way is by using \\b, as noted by others. Alternatively you can use positive lookahead:

Data:

x <- c("FLT1", "FLT1P1", "FLT1-FLT2", "SGY-FLT1, GPD")

Solution:

grep("FLT1(?=$|-|,)", x, perl = T, value = T)
[1] "FLT1"          "FLT1-FLT2"     "SGY-FLT1, GPD"

Here, grepmatches FLT1 if, and only if, the immediately next thing is either the end of the string ($) or - or ,. By implication, it does not match when the immediately next char is, for example, alphanumeric.

Or, if the rule is that you want to exclude values where alphanumeric characters are added, you can use negative lookahead:

grep("FLT1(?!\\w)", x, perl = T, value = T)
[1] "FLT1"          "FLT1-FLT2"     "SGY-FLT1, GPD"
Chris Ruehlemann
  • 20,321
  • 4
  • 12
  • 34