1

I am looking for a purely Regex solution to get odd length substring made up of similar characters.

For example, my string:

hjasaaasjasjbbbbbashjasccccccc

So, the resulting matches should be:

[aaa],[bbbbb],[ccccccc]

So far, I have tried:

(?<!\1\1)*(?<!\1)(.)(\1\1)*(?:\1\1)*(?!\1)

But it's not working.

user786
  • 3,902
  • 4
  • 40
  • 72
  • 2
    You may use a pattern like `(.)(?<!\1{2})(?:\1\1)+(?!\1)`, see [demo](http://regexstorm.net/tester?p=%28.%29%28%3f%3c!%5c1%7b2%7d%29%28%3f%3a%5c1%5c1%29%2b%28%3f!%5c1%29&i=hjasaaasjasjbbbbbashjasccccccc%0d%0aaabbccc). – Wiktor Stribiżew Mar 17 '19 at 09:57
  • @WiktorStribiżew That one is better. It was stupid of me to add two redundant `\1` before and after the group :-D Well, you're the king of regex, so of course it's better :-) – 41686d6564 stands w. Palestine Mar 17 '19 at 10:02
  • 1
    @AhmedAbdelhameed It is basically the same, just a bit contracted. You may add it to your answer if you wish. – Wiktor Stribiżew Mar 17 '19 at 10:11
  • @WiktorStribiżew I am trying to understand why you need `(?<!\1{2})` and `(?!\1)`. wouldn't `(.)(?:\1\1)+` do the job just as well? – Gilad Shnoor Mar 17 '19 at 11:12
  • 1
    @GiladShnoor The point of `(?<!\1{2})` (or `(?<!\1\1)` or `(?<!\1.)` or `(?<!\1[\s\S])`, they will do the same job here) is to make sure the char before the captured one is not the same char. – Wiktor Stribiżew Mar 17 '19 at 11:14
  • @WiktorStribiżew I think I didn't understand the og question. If we have the string `baaaa` we should not match anything correct? – Gilad Shnoor Mar 17 '19 at 11:27
  • @GiladShnoor Well, it can only be assumed that OP wants to match substrings of at least 3 char in length, and in `baaaa` the `b` is single and `aaaa` is a streak of even `a`s. No matches are expected. – Wiktor Stribiżew Mar 17 '19 at 11:29
  • @WiktorStribiżew Thanks for clarifying that. I thought that with `baaaa` OP wanted to get `aaa` – Gilad Shnoor Mar 17 '19 at 11:32

1 Answers1

3

For a regex-only solution that matches an odd number of any character (excluding one-character matches):

(.)(?<!\1\1)\1(?:\1\1)*\1(?!\1)

Or a shorter version thanks to Wiktor:

(.)(?<!\1\1)(?:\1\1)+(?!\1)

Demo.

Breakdown:

(.)         # First capturing group - matches any character.
(?<!\1\1)   # Negative lookbehind - ensures the matched char isn't preceded by the same char.
(?:\1\1)    # A non-capturing group that matches two occurrences of 
            # the same char (at least 3 in total).
+           # Matches between one and unlimited times of the previous group.
(?!\1)      # Negative lookahead to make sure no extra occurrence of the char follows.

Demo in C#:

string input = "hjasaaasjasjbbbbbashjasccccccc";
string pattern = @"(.)(?<!\1\1)(?:\1\1)+(?!\1)";
var matches = Regex.Matches(input, pattern);
foreach (Match m in matches)
    Console.WriteLine(m.Value);

Output:

aaa
bbbbb
ccccccc

If you want a solution that matches an odd number of any character (including one-character matches):

(.)(?<!\1\1)(?:\1\1)*(?!\1)

Demo.