One way to do the key part with regex is by using a feature that allows code execution inside regex, to build a frequency hash (map, dictionary). That can then be used to select words that repeat.
Doing it all in regex alone, not ever using any language or tool, isn't really possible (except perhaps using recursion if supported). The rest of this post employs basics of a programming language, in the context of the regex feature that allows code execution in regex.
A feature that I think is available to most engines is to run code in order to prepare the replacement string. I use Perl for the following short samples. What is of interest here is done using a side effect, what of course isn't optimal (and usually results in clumsy looking code).
Either run the normal substitution and put back the matched word
$string =~ s/(\w+)/++$freq{$1}; $1/eg;
or use the "non-destructive" substitution, which doesn't change the target (and returns the changed string or the original if the match failed)
my $toss = $string =~ s/(\w+)/++$freq{$1}/egr;
The returned string is unneeded for the task as described. In both cases we run a substitution on each word when this isn't what we actually need.
Then print keys (words) with frequencies larger than 1
foreach my $word (keys %freq) { say $word if $freq{$word} > 1 }
The regex matchs "words" per \w
character class; adjust as suitable for your need.
Altogether, since this is a tricky task for a regex I'd recommend to split the string into words and count off duplicates using your language's features, rather than push the regex. Any language worth its salt can do this rather elegantly and efficiently.
With Perl, another way would be to use the embedded code construct, that allows code to run in the matching part. As far as I know that is not available in other languages, save for some libraries.
One can run code in the matching part like so
my @matches = $string =~ /(\w+)(?{++$freq{$1}})/g;
where the construct (?{code})
will execute embedded code, here building the frequency hash. Using this feature (safely) requires close reading of documentation.
The @matches
above has all words, and is not of interest in the problem as stated; it is used here to put the regex's match operator in the "list context" so that the search continues through the string via the /g
modifier.