3

I'm stuck on making this Regex. I tried using look-ahead and look-behind together, but I couldn't use the capture group in the look-behind. I need to extract characters from a string ONLY if it occurs 4 times.

If I have these strings

  • 3346AAAA44
  • 3973BBBBBB44
  • 9755BBBBBBAAAA44

The first one will match because it has 4 A's in a row. The second one will NOT match because it has 6 B's in a row. The third one will match because it still has 4 A's. What makes it even more frustrating, is that it can be any char from A to Z occuring 4 times.

Positioning does not matter.

EDIT: My attempt at the regex, doesn't work.

(([A-Z])\2\2\2)(?<!\2*)(?!\2*)
Styn
  • 191
  • 2
  • 9

3 Answers3

3

If lookbehind is allowed, after capturing the character, negative lookbehind for \1. (because if that matches, the start of the match is preceded by the same character as the captured first character). Then backreference the group 3 times, and negative lookahead for the \1:

`3346AAAA44
3973BBBBBB44
9755BBBBBBAAAA44`
.split('\n')
.forEach((str) => {
  console.log(str.match(/([a-z])(?<!\1.)\1{3}(?!\1)/i));
});
  • ([a-z]) - Capture a character
  • (?<!\1.) Negative lookbehind: check that the position at the 1st index of the captured group is not preceded by 2 of the same characters
  • \1{3} - Match the same character that was captured 3 more times
  • (?!\1) - After the 4th match, make sure it's not followed by the same character
CertainPerformance
  • 356,069
  • 52
  • 309
  • 320
2

Another version without lookbehind (see demo). The captured sequence of 4 equal characters will be rendered in Group 2.

(?:^|(?:(?=(\w)(?!\1))).)(([A-Z])\3{3})(?:(?!\3)|$)
  • (?:^|(?:(?=(\w)(?!\1))).) - ensure it's the beginning of the string. Otherwise, the 2nd char must be different from the 1st one - if yes, skip the 1st char.
  • (([A-Z])\3{3}) Capture 4 repeated [A-Z] chars
  • (?:(?!\3)|$) - ensure the first char after those 4 is different. Or it's the end of the string

As it was suggested by bobble-bubble in this comment - the expression above can be simplified to (demo):

(?:^|(\w)(?!\1))(([A-Z])\3{3})(?!\3)
AndreyCh
  • 1,298
  • 1
  • 14
  • 16
  • Thanks for pointing out to the failing case and for ideas how to fix. I've slightly restructured the regex, so it should handle the `AAAAA` case properly now. – AndreyCh Dec 04 '19 at 12:46
  • 1
    Smart update. Looks like really nice! I already upvoted :) I wonder if it still would work [a bit shorter like this](https://regex101.com/r/pDFDTB/7). However, hats off :) – bobble bubble Dec 04 '19 at 13:10
  • 1
    Your shortened version is very nice! Seems to work well with all scenarios – AndreyCh Dec 04 '19 at 13:27
0

Another variant could be capturing the first char in a group 1.

Assert that then the previous 2 chars on the left are not the same as group 1, match an additional 3 times group 1 which is a total of 4 the same chars.

Then assert what is on the right is not group 1.

([A-Z])(?<!\1\1)\1{3}(?!\1)
  • ([A-Z]) Capture group 1, match a single char A-Z
  • (?<!\1\1) Negative lookbehind, assert what is on the left is not 2 times group 1
  • \1{3} Match 3 times group 1
  • (?!\1) Assert what is on the right is not group 1

For example

let pattern = /([A-Z])(?<!\1\1)\1{3}(?!\1)/g;
[
  "3346AAAA44",
  "3973BBBBBB44",
  "9755BBBBBBAAAA44",
  "AAAA",
  "AAAAB",
  "BAAAAB"
].forEach(s =>
  console.log(s + " --> " + s.match(pattern))
);
The fourth bird
  • 154,723
  • 16
  • 55
  • 70