1

Using R I would like to get from a vector of strings all strings that end with _XXX except those that have the word "Total" before _XXX.

mystringvector <- c("str1_XXX","str2_XXY","str3_XXX","Total_XXX")
grep("*_XXX",mystringvector,value=TRUE)

The results should return only str1_XXX and str3_XXX. But how can I include the exception for Total_.

Ali
  • 3,373
  • 5
  • 42
  • 54
Johannes
  • 1,024
  • 13
  • 32
  • `grep("(?<!Total)_XXX",mystringvector,value=TRUE, perl=TRUE)` `# [1] "str1_XXX" "str3_XXX"` – Cath Jul 13 '17 at 10:44
  • 1
    @Cath Maybe even `"(?<!^Total)_XXX"` with `perl=TRUE` if `SomeTotal_XXX` should be matched. Or just make sure there is a digit before `_`: `grep("\\d_XXX",mystringvector,value=TRUE)` – Wiktor Stribiżew Jul 13 '17 at 10:45
  • @Cath Ok, trimmed my comment. – Wiktor Stribiżew Jul 13 '17 at 10:46
  • @WiktorStribiżew maybe but I really doubt that the real-life strings are `str1`, `str2` ;-) – Cath Jul 13 '17 at 10:50
  • 1
    @Cath That is why I am not a fan of answering "oversimplified" questions. The examples should be real-life ones as the solution can be much better than a generic one – Wiktor Stribiżew Jul 13 '17 at 10:51
  • @WiktorStribiżew I can understand but to me it's more like if OP states the strings are "sometext_XXX", like "firststring_XXX", "secondstring_XXY", etc. it's different from having not reproducible example imo – Cath Jul 13 '17 at 10:54

1 Answers1

5

You can use a lookahead, turning on perl option to precise you don't want _XXX to be preceded by Total:

grep("(?<!Total)_XXX", mystringvector, value=TRUE, perl=TRUE) 
# [1] "str1_XXX" "str3_XXX"

?< means "what is before must be" and ! negates what's after it ("Total" here).

Cath
  • 23,906
  • 5
  • 52
  • 86