-2

First of all, I am aware that this is a problem you wouldn't usually use regex for, I am just trying to find out whether this is even possible.

That being said, what I am trying to do is match ALL occurrences of any permutation of a string (for now, I don't care if overlapping occurences match or not); for example, if I have the string abc, I want to match all occurrences of abc, acb, bac, bca, cab and cba.

What I have until now is the following regex: (?:([abc])(?!.{0,1}\1)){3} (note: I know that I could use + instead of {0,1}, but that only works for strings with length 3). This kind of works, but if there are two permutations next to each other where a letter of the first one is too close to a letter of the second one (eg. abc cbac c), the first permutation does not match. Is it possible to solve this using regex?

Mark
  • 143,421
  • 24
  • 428
  • 436
Lukor
  • 1,485
  • 2
  • 14
  • 25

1 Answers1

2

Direct Approach

[abc]{3} would match too many results since it would also match aab. In order to not double match a you would need to remove a from the group that follows leaving you with a[bc]{2}.

a[bc]{2} would match too many results since it would also match 'abb'. In order to not double match b you would need to remove a from the group that follows leaving you with ab[c]{1} or abc for short.

abc would not match all combinations so you would need another group.

(abc)|([abc]{3}) which would match too many combinations again.

This path leads you down the road of having all permutations listed explicitly in groups.

Can you create combinations so that you do not need to write out all combinations?

(abc)|(acb) could be writtean as a((bc)|(cb)).

(bc)|(cb) I can not shorten that any further.

Match too many and remove unwanted

Depending on the regex engine you may be able to express AND as a look ahead so that you can remove matches. THIS and not THAT consume THIS.

(?=[abc]{3})(?=(?!a.a))[abc]{3} would not match aca.

This problem is now simmilar to the one above where you need to remove all combinations that would violate your permutations. In this example that is any expression containing the same character mutltiple times.

'(.)\1+' this expression uses grouping references on its own matches the same character multiple times but requires knowing how many groups exist in the expression and is very brittle Adding groups kills the expression ((.)\1+) no longer matches. Relative back references exist and require knowledge of your specific regex engine. \k<-1> may be what you could be looking for. I will assume .net since I happen to have a regex tester bookmarked for that.

The permutations that I want to exclude are: nn. n.n .nn nnn

So I create these patterns: ((?<1>.)\k<1>.) ((?<2>.).\k<2>) (.(?<3>.)\k<3>) ((?<4>.)\k<4>\k<4>)

Putting it all together gives me this expression, note that I used relative back references as they are in .net - your milage may vary.

(?=[abc]{3})(?=(?!((?<1>.)\k<1>.)))(?=(?!((?<2>.).\k<2>)))(?=(?!(.(?<3>.)\k<3>)))(?=(?!((?<4>.)\k<4>\k<4>)))[abc]{3}

The answer is yes for a specific length.

Here is some testing data.

Johannes
  • 6,490
  • 10
  • 59
  • 108