I need a regular expression that matches words regardless of order. As an example, these lines should match with the marked range,
A longword1 B longword2 C
^-------------------^
A longword2 B longword1 C
^-------------------^
while these shouldn't:
A longword1 B longword1 C
A longword2 B longword2 C
A longword1 B
A longword2 C
(A, B, C are fillers, they can be essentially any text)
It is possible to just use alternations, such as: \b((longword1).*?(longword2)|(longword2).*?(longword2))\b
. But the regex would grow factorially, i.e. three words would need 3! alternates. It's also possible to use subroutines, e.g. \b((?'A'longword1).*?(?'B'longword2')|(?P>B).*?(?P>A))\b
. Although shorter, I would still need to include all of its permutations.
Now I've read this post and this other one, but the accepted answers don't exactly solve my problem. Using \b(?=.*longword1)(?=.*longword2).*\b
would match the whole line instead of the range I've shown.
I understand, that it would be much easier if I checked the sentence against a list of words. But my current use case prevents it from being possible; I can only use regexes.
Here are some links to demonstrate what I meant:
EXPECTED:
- Using alternates: https://regexr.com/5b6pv
- Using subroutines: https://regexr.com/5b6ss
INCORRECT:
- Using positive lookahead (as linked): https://regexr.com/5b6q2
Are there any simpler regex(es) to tackle this?