-1

Imagine I have the string abcdefghi If I apply the regular expression

m/([a-z])([a-z])/g

to it, I get disjoint pairs ab, cd, ef, gh.

What I want is all overlapping pairs ab, bc, cd, de, ef, fg, gh, hi.

When I use a lookahead, like

m/([a-z])(?=[a-z])/g

I get the first letter of each pair a, b, c, d, e, f, g, h, but the lookahead per se is not kept.

How can I tell the regex engine that I want the first letter but also the lookahead, in order to obtain pairs of letters ab, bc, cd, de, ef, fg, gh, hi?

yannis
  • 819
  • 1
  • 9
  • 26
  • 1
    You also capture what's inside the lookahead, like this `([a-z])(?=([a-z]))` – Sweeper Sep 14 '19 at 15:16
  • See for example https://stackoverflow.com/questions/20833295/how-can-i-match-overlapping-strings-with-regex or https://stackoverflow.com/questions/11430863/how-to-find-overlapping-matches-with-a-regexp – The fourth bird Sep 14 '19 at 15:24

2 Answers2

1

The () around lookaheads are non-capturing, and because lookaheads are 0-width matches, you don't get the characters that are "looked at" in the result.

You just need to make the contents of the lookahead capturing by surround it with a capturing group:

([a-z])(?=([a-z]))

On a side note, there are other ways to get overlapping pairs, such as with a for loop that loops to (the string's length - 2). You might want to consider these options as well.

Sweeper
  • 213,210
  • 22
  • 193
  • 313
0

You can do it by relying on the engines BUMP ALONG feature.
By using a zero width assertion containing a single capture group to contain
each pair.

Since the engine did not CONSUME any characters it has a built-in
mechanism to avoid an endless loop, which is to increment the current position
by 1.

(?=([a-z]{2}))

https://regex101.com/r/GYcgiZ/1

Or,

You can do it yourself by matching 2 and consuming 1.

(?=([a-z]{2})).

https://regex101.com/r/re917b/1