0

It seems that I can't find a solution for this perhaps an easy problem: I want to be able to match with a simple regex all possible permutations of 5 specified digits, without repeating, where all digits must be used. So, for this sequence:

12345

the valid permutation is:

54321

but

55555

is not valid.

However, if the provided digits have the same number once or more, only in that case the accepted permutations will have those repeated digits, but each digit must be used only once. For example, if the provided number is:

55432

we see that 5 is provided 2 times, so it must be also present two times in each permutation, and some of the accepted answers would be:

32545

45523

but this is wrong:

55523

(not all original digits are used and 5 is repeated more than twice)

I came very close to solve this using:

(?:([43210])(?!.*\1)){5}

but unfortunately it doesn't work when there are multiple same digits provided(like 43211).

Zed
  • 5,683
  • 11
  • 49
  • 81

2 Answers2

0

One way to solve this is to make a character class out of the search digits and build a regex to search for as many digits in that class as are in the search string. Then you can filter the regex results based on the sorted match string being the same as the sorted search string. For example:

import re

def find_perms(search, text):
    search = sorted(search)
    regex = re.compile(rf'\b[{"".join(search)}]{{{len(search)}}}\b')
    matches = [m for m in regex.findall(text) if sorted(m) == search]
    return matches

print(find_perms('54321', '12345 54321 55432'))
print(find_perms('23455', '12345 54321 55432'))
print(find_perms('24455', '12345 54321 55432'))

Output:

['12345', '54321']
['55432']
[]

Note I've included word boundaries (\b) in the regex so that (for example) 12345 won't match 654321. If you want to match substrings as well, just remove the word boundaries from the regex.

Nick
  • 138,499
  • 22
  • 57
  • 95
0

The mathematical term for this is a mutliset. In Python, this is handled by the Counter data type. For example,

from collections import Counter
target = '55432'
candidate = '32545'
Counter(candidate) == Counter(target)

If you want to generate all of the multisets, here's one question dealing with that: How to generate all the permutations of a multiset?

Acccumulation
  • 3,491
  • 1
  • 8
  • 12
  • Out of curiosity I timed comparing two counters and comparing two sorted lists (where the inputs are 5 character strings) and comparing the sorted lists is about 6x faster. Even for 10 character strings sorting is ~4x faster. https://rextester.com/IBG64719 – Nick Apr 11 '21 at 03:31