There is no strictly better solution, but there's an alternative.
Now, if you have two normal words like "fat" and "cat", then (fat cat|cat fat)
is undoubtedly the best solution. But what if you have 5 words? Or if you have more complex patterns than just fat
and cat
that you don't want to type twice?
Say instead of fat
and cat
you have 3 regex patterns A
, B
and C
, and instead of the space between fat
and cat
you have the regex pattern S
. In that case, you could use this recipe:
(?:(?:(?!\1)()|\1(?:S))(?:(?!\2)()(?:A)|(?!\3)()(?:B)|(?!\4)()(?:C))){3}
If you don't have an S
, this can be simplified to
(?:(?!\1)()(?:A)|(?!\2)()(?:B)|(?!\3)()(?:C)){3}
(Note: (?:X)
can be simplified to X
if X
doesn't contain an alternation |
.)
Example
If we set A
= fat
, B
= cat
and S
= space, we get:
(?:(?:(?!\1)()|\1 )(?:(?!\2)()fat|(?!\3)()cat)){2}
Try it online.
Explanation
In essence, we're using capture groups to "remember" which patterns have already matched. To do so, we use this little pattern here:
(?!\1)()some_pattern
What does this do? It's a regex that matches exactly once. Once it has matched, it won't ever match again. If you try to add a quantifier around that pattern like (?:(?!\1)()some_pattern)*
it'll match either once or won't match at all.
The trick there is the usage of a backreference to a capture group before that group has even been defined. Because capture groups are initialized with a "failed to match" state, the negative lookahead (?!\1)
will match successfully - but only the first time. Because right afterwards, the capture group ()
matches and captures the empty string. From this point forward, the negative lookahead (?!\1)
will never match again.
With this as a building block, we can create a regex that matches fatcat
and catfat
while only containing the words fat
and cat
once:
(?:(?!\1)()fat|(?!\2)()cat){2}
Because of the negative lookaheads, each word can only match at most once. Adding a {2}
quantifier at the end guarantees that each of the two words matches exactly once, or the entire match fails.
Now we just need to find a way to match a space between fat
and cat
. Well, that's just a slight variation of the same pattern:
(?:(?!\1)()|\1 )
This pattern will match the empty string on its first match, and on each subsequent match it'll match a space.
Put it all together, and voilà:
(?:(?:(?!\1)()|\1 )(?:(?!\2)()fat|(?!\3)()cat)){2}
Templates (for the lazy)
2 patterns A
and B
, with separator S
:
(?:(?:(?!\1)()|\1(?:S))(?:(?!\2)()(?:A)|(?!\3)()(?:B))){2}
3 patterns A
, B
and C
, with separator S
:
(?:(?:(?!\1)()|\1(?:S))(?:(?!\2)()(?:A)|(?!\3)()(?:B)|(?!\4)()(?:C))){3}
4 patterns A
, B
, C
and D
, with separator S
:
(?:(?:(?!\1)()|\1(?:S))(?:(?!\2)()(?:A)|(?!\3)()(?:B)|(?!\4)()(?:C)|(?!\5)()(?:D))){4}
2 patterns A
and B
, without S
:
(?:(?!\1)()(?:A)|(?!\2)()(?:B)){2}
3 patterns A
, B
and C
, without S
:
(?:(?!\1)()(?:A)|(?!\2)()(?:B)|(?!\3)()(?:C)){3}
4 patterns A
, B
, C
and D
, without S
:
(?:(?!\1)()(?:A)|(?!\2)()(?:B)|(?!\3)()(?:C)|(?!\4)()(?:D)){4}