1

I would like to match all recurring characters in a string. So if I have:

aecffead

the regex should match all characters, besides c and d (cause those characters only occur once).

Now I already have this, which matches all characters which will occur later on in the string

/([a-z])(?=.*\1)/g

It matches these bold characters: aecffead

But it should match these bold characters: aecffead

My regex doesn't match the last "fea", cause those characters will not occur later on in the string. But because they already occurred, I want to match them as well.

Anyone who knows how to fix this? I thought about combining a positive look-ahead with a positive look-behind, but I cannot get it done.

EDIT:

Just to clarify: The idea is to remove all characters which occur more than once. So not only the duplicates, but if character a occurs more than once, I want to remove all a characters from the string.

Erik van de Ven
  • 4,747
  • 6
  • 38
  • 80
  • It does not look like a problem that should be solved with a PCRE regex. You need an infinite width lookbehind for that. – Wiktor Stribiżew Jun 22 '17 at 14:00
  • Your regex has a limitation in that the final occurrence of a character will never match the regex pattern, because a further duplicate never occurs for that character. Update: Based on Wiktor's comment, maybe you should consider something like iterating over the string by character and checking duplicate logic some other way. – Tim Biegeleisen Jun 22 '17 at 14:00
  • maybe this answer helps: https://stackoverflow.com/a/18305820/2249798 – m13r Jun 22 '17 at 14:01
  • Note that this would not account for characters that *look* the same but are in fact different such as characters from different character sets, etc. Such as Cryllic `a` and latin `a`. – Martin Jun 22 '17 at 14:03
  • Agreed! You need some fixed sized width to detect from. What will be the scope to find ? Till how long your string will occur? – Jaymin Jun 22 '17 at 14:03
  • What do you want to do with the matches charachters? – modsfabio Jun 22 '17 at 14:05
  • @modsfabio, check my edited answer. @ JayminsFakeAccount there is no fixed size, but I'm using PHP so I probably could edit the regex according to the character count which PHP returns. – Erik van de Ven Jun 22 '17 at 14:07
  • Possible duplicate of [Remove all duplicate characters in a string?](https://stackoverflow.com/questions/18305797/remove-all-duplicate-characters-in-a-string) – m13r Jun 22 '17 at 14:08

1 Answers1

2

You can convert the string to an array:

$string = "aecffead";
var_export(array_keys(array_intersect(array_count_values(str_split($string)),[1])));

Output:

array (
  0 => 'c',
  1 => 'd',
)

This gets the value counts as an array, then uses array_intersect() to only retain values that occur once, then turns the keys into the values of a zero-index array.

Additionally you can convert the array back to a string using implode()

Example: https://eval.in/820930


Edit: alternatively you can try this (using your regex-pattern):

$string = "aecffead";

preg_match_all('/([a-z])(?=.*\1)/', $string, $matches);

echo str_replace($matches[0], "", $string); //Output: cd

Example: https://eval.in/820969

modsfabio
  • 1,097
  • 1
  • 13
  • 29
  • Quite some functions, but it does the job perfectly! So thank you! :) Even though it isn't a regex like I asked, I will accept this answer cause it's the only (and best) answer so far which returns the result I asked for. – Erik van de Ven Jun 22 '17 at 14:14
  • I added another method using your pattern, seems to work well. I guess it's better than the first solution – modsfabio Jun 22 '17 at 14:55