2

I need to use grep to find all strings ending with G but NOT ending with STIG or VÄG in the following character vector:

test=c("EWASPG","AVOPW","SDAGSTIG","ASDVVÄG","ASDVWE","QSCIUVG","QWSFNJG")

I tried this but it returns false for any string with the letters S, T , I, V, Ä preceding the G instead of returning false when the G is preceded by the exact phrase.

grep("[^((STI)|(VÄ))]G$", test, value=T)

# [1] "EWASPG"  "QWSFNJG"

Thanks!

I am aware of this post.

Community
  • 1
  • 1
RobustTurd
  • 93
  • 8

1 Answers1

4

A character class always matches a single character, so [^(STI)] would match any character except (, S, T, I or ).

You can use a negative lookbehind assertion to make sure that the string doesn't end in a certain substring, but you need to enable Perl-compatible regex mode in R:

grep("(?<!STI|VÄ)G$", test, perl=TRUE, value=TRUE);

Test it live on regex101.com.

Tim Pietzcker
  • 328,213
  • 58
  • 503
  • 561
  • 1
    In R, this would then be `grep('(?<!STI|VÄ)G$', test, perl=TRUE, value=TRUE)` – jbaums Jun 04 '14 at 11:43
  • Thanks a bunch! R seems to work fine with Tim's code without needing jbaums extra ' . – RobustTurd Jun 04 '14 at 11:58
  • Not sure which extra quote you're referring to. Unless you're just referring to my single quotes in general, in which case they're completely exchangeable with double quotes in this situation. – jbaums Jun 04 '14 at 12:00