Regex: Match all duplicate characters PHP

Question

I would like to match all recurring characters in a string. So if I have:

aecffead

the regex should match all characters, besides c and d (cause those characters only occur once).

Now I already have this, which matches all characters which will occur later on in the string

/([a-z])(?=.*\1)/g

It matches these bold characters: aecffead

But it should match these bold characters: aecffead

My regex doesn't match the last "fea", cause those characters will not occur later on in the string. But because they already occurred, I want to match them as well.

Anyone who knows how to fix this? I thought about combining a positive look-ahead with a positive look-behind, but I cannot get it done.

EDIT:

Just to clarify: The idea is to remove all characters which occur more than once. So not only the duplicates, but if character a occurs more than once, I want to remove all a characters from the string.

It does not look like a problem that should be solved with a PCRE regex. You need an infinite width lookbehind for that. — Wiktor Stribiżew, Jun 22 '17 at 14:00
Your regex has a limitation in that the final occurrence of a character will never match the regex pattern, because a further duplicate never occurs for that character. Update: Based on Wiktor's comment, maybe you should consider something like iterating over the string by character and checking duplicate logic some other way. — Tim Biegeleisen, Jun 22 '17 at 14:00
maybe this answer helps: https://stackoverflow.com/a/18305820/2249798 — m13r, Jun 22 '17 at 14:01
Note that this would not account for characters that *look* the same but are in fact different such as characters from different character sets, etc. Such as Cryllic `a` and latin `a`. — Martin, Jun 22 '17 at 14:03
Agreed! You need some fixed sized width to detect from. What will be the scope to find ? Till how long your string will occur? — Jaymin, Jun 22 '17 at 14:03
@modsfabio, check my edited answer. @ JayminsFakeAccount there is no fixed size, but I'm using PHP so I probably could edit the regex according to the character count which PHP returns. — Erik van de Ven, Jun 22 '17 at 14:07
Possible duplicate of [Remove all duplicate characters in a string?](https://stackoverflow.com/questions/18305797/remove-all-duplicate-characters-in-a-string) — m13r, Jun 22 '17 at 14:08

modsfabio · Accepted Answer · 2017-06-22T15:00:31.427

2

You can convert the string to an array:

$string = "aecffead";
var_export(array_keys(array_intersect(array_count_values(str_split($string)),[1])));

Output:

array (
  0 => 'c',
  1 => 'd',
)

This gets the value counts as an array, then uses array_intersect() to only retain values that occur once, then turns the keys into the values of a zero-index array.

Additionally you can convert the array back to a string using implode()

Example: https://eval.in/820930

Edit: alternatively you can try this (using your regex-pattern):

$string = "aecffead";

preg_match_all('/([a-z])(?=.*\1)/', $string, $matches);

echo str_replace($matches[0], "", $string); //Output: cd

Example: https://eval.in/820969

edited Jun 22 '17 at 15:00

answered Jun 22 '17 at 14:08

modsfabio

1,097
1
13
29

Quite some functions, but it does the job perfectly! So thank you! :) Even though it isn't a regex like I asked, I will accept this answer cause it's the only (and best) answer so far which returns the result I asked for. – Erik van de Ven Jun 22 '17 at 14:14
I added another method using your pattern, seems to work well. I guess it's better than the first solution – modsfabio Jun 22 '17 at 14:55

Regex: Match all duplicate characters PHP

1 Answers1