2

I'm trying to use a regex to replace the last instance of a phrase (and everything after that phrase, which could be any character):

stringi::stri_replace_last_regex("_AB:C-_ABCDEF_ABC:45_ABC:454:", "_ABC.*$", "CBA")

However, I can't seem to get the refex to function properly:

Input: "_AB:C-_ABCDEF_ABC:45_ABC:454:"
Actual output: "_AB:C-CBA"
Desired output: "_AB:C-_ABCDEF_ABC:45_CBA"

I have tried gsub() as well but that hasn't worked.

Any ideas where I'm going wrong?

SimonSchus
  • 65
  • 10

4 Answers4

2

One solution is:

sub("(.*)_ABC.*", "\\1_CBA", Input)
[1] "_AB:C-_ABCDEF_ABC:45_CBA"
G5W
  • 36,531
  • 10
  • 47
  • 80
2

Have a look at what stringi::stri_replace_last_regex does:

Replaces with the given replacement string last substring of the input that matches a regular expression

What does your _ABC.*$ pattern match inside _AB:C-_ABCDEF_ABC:45_ABC:454:? It matches the first _ABC (that is right after C-) and all the text after to the end of the line (.*$ grabs 0+ chars other than line break chars to the end of the line). Hence, you only have 1 match, and it is the last.

Solutions can be many:

1) Capturing all text before the last occurrence of the pattern and insert the captured value with a replacement backreference (this pattern does not have to be anchored at the end of the string with $):

sub("(.*)_ABC.*", "\\1_CBA","_AB:C-_ABCDEF_ABC:45_ABC:454:")

2) Using a tempered greedy token to make sure you only match any char that does not start your pattern up to the end of the string after matching it (this pattern must be anchored at the end of the string with $):

sub("(?s)_ABC(?:(?!_ABC).)*$", "_CBA","_AB:C-_ABCDEF_ABC:45_ABC:454:", perl=TRUE)

Note that this pattern will require perl=TRUE argument to be parsed with a PCRE engine with sub (or you may use stringr::str_replace that is ICU regex library powered and supports lookaheads)

3) A negative lookahead may be used to make sure your pattern does not appear anywhere to the right of your pattern (this pattern does not have to be anchored at the end of the string with $):

sub("(?s)_ABC(?!.*_ABC).*", "_CBA","_AB:C-_ABCDEF_ABC:45_ABC:454:", perl=TRUE)

See the R demo online, all these three lines of code returning _AB:C-_ABCDEF_ABC:45_CBA.

Note that (?s) in the PCRE patterns is necessary in case your strings may contain a newline (and . in a PCRE pattern does not match newline chars by default).

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
1

Using gsub and back referencing

gsub("(.*)ABC.*$", "\\1CBA","_AB:C-_ABCDEF_ABC:45_ABC:454:")
[1] "_AB:C-_ABCDEF_ABC:45_CBA"
GordonShumway
  • 1,980
  • 13
  • 19
1

Arguably the safest thing to do is using a negative lookahead to find the last occurrence:

_ABC(?:(?!_ABC).)+$

Demo

gsub("_ABC(?:(?!_ABC).)+$", "_CBA","_AB:C-_ABCDEF_ABC:45_ABC:454:", perl=TRUE)
[1] "_AB:C-_ABCDEF_ABC:45_CBA"
wp78de
  • 18,207
  • 7
  • 43
  • 71