5

Given this vector:

ba <- c('baa','aba','abba','abbba','aaba','aabba')'

I want to change the final a of each word to i except baa and aba.

I wrote the following line ...

gsub('(?<=a[ab]b{1,2})a','i',ba,perl=T)

but was told: PCRE pattern compilation error 'lookbehind assertion is not fixed length' at ')a'.

I looked around a little bit and apparently R/Perl can only lookahead for a variable width, not lookbehind. Any workaround to this problem? Thanks!

hwnd
  • 69,796
  • 4
  • 95
  • 132
dasf
  • 1,035
  • 9
  • 16

2 Answers2

7

You can use the lookbehind alternative \K instead. This escape sequence resets the starting point of the reported match and any previously consumed characters are no longer included.

Quotedrexegg

The key difference between \K and a lookbehind is that in PCRE, a lookbehind does not allow you to use quantifiers: the length of what you look for must be fixed. On the other hand, \K can be dropped anywhere in a pattern, so you are free to have any quantifiers you like before \K.

Using it in context:

sub('a[ab]b{1,2}\\Ka', 'i', ba, perl=T)
# [1] "baa"   "aba"   "abbi"  "abbbi" "aabi"  "aabbi"

Avoiding lookarounds:

sub('(a[ab]b{1,2})a', '\\1i', ba)
# [1] "baa"   "aba"   "abbi"  "abbbi" "aabi"  "aabbi"
hwnd
  • 69,796
  • 4
  • 95
  • 132
  • Can I also ask if there is an equivalent of `\\K` in the other direction, i.e. resetting the end point of the reported match? – dasf Mar 28 '15 at 00:21
  • 1
    Yes, `\G` if I follow what you are asking. – hwnd Mar 28 '15 at 18:48
2

Another solution for the current case only, when the only quantifier used is a limiting quantifier, may be using stringr::str_replace_all / stringr::str_replace:

> library(stringr)
> str_replace_all(ba, '(?<=a[ab]b{1,2})a', 'i')
[1] "baa"   "aba"   "abbi"  "abbbi" "aabi"  "aabbi"

It works because stringr regex functions are based on ICU regex that features a constrained-width lookbehind:

The length of possible strings matched by the look-behind pattern must not be unbounded (no * or + operators.)

So, you can't really use any kind of patterns inside ICU lookbehinds, but it is good to know you may use at least a limiting quantifier in it when you need to get overlapping texts within a known distance range.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563