2

I have been searching all over the internet for a solution, but could not find one.

I need to remove duplicate characters within a String but would also like to include an exception to allow a integer amount of characters to repeat / remain in the string.

For example, I tried the following:

$str = 'This ----------is******** a bbbb 999-999-9999 ******** 8888888888 test 4444444444 ********##########Sammy!!!!!! ###### hello !!!!!!';

$t1 = preg_replace('/(.)\1{3,}/','',$str);
$t2 = preg_replace('/(\S)\1{3,}/','',$str);
$t3 = preg_replace('{(.)\1+}','$1',$str);
$t4 = preg_replace("/[;,:\s]+/",',',$str);
$t5 = preg_replace('/\W/', '', $str);
$t6 = preg_replace( "/[^a-z]/i", "", $str);

echo '$t1 = '.$t1.'<br>';
echo '$t2 = '.$t2.'<br>';
echo '$t3 = '.$t3.'<br>';
echo '$t4 = '.$t4.'<br>';
echo '$t5 = '.$t5.'<br>';
echo '$t6 = '.$t6.'<br>';

Results:

$t1 = This is a 999-999- test Sammy hello 
$t2 = This is a 999-999- test Sammy hello 
$t3 = This -is* a b 9-9-9 * 8 test 4 *#Samy! # helo !
$t4 = This,----------is********,a,bbbb,999-999-9999,********,8888888888,test,4444444444,********##########Sammy!!!!!!,######,hello,!!!!!!
$t5 = Thisisabbbb99999999998888888888test4444444444Sammyhello
$t6 = ThisisabbbbtestSammyhello

The desired out put would be:

This ---is*** a bbbb 999-999-9999 *** 8888888888 test 4444444444 ***###Sammy!!! ### hello !!!

As you can see, the desired output leaves the numbers alone and only leaves 3 repeated characters, i.e. --- ### * !!!

I would like to be able to change the exceptions from 3 to any other integer if possible.

Thanks in advance.

Sammy
  • 877
  • 1
  • 10
  • 23
  • 1
    `/([^0-9])\1{3,}`? if you want to allow digits to repeat, then exclude digits from the repetition check. – Marc B May 25 '12 at 21:14
  • thanks, but how would the entire preg_replace statement be written to incorporate this? – Sammy May 25 '12 at 21:20

2 Answers2

3

This will do it:

preg_replace('/(([^\d])\2\2)\2+/', '$1', $str);

[^\d] matches a single character which isn't a digit.
\2 refers to the captured digit
$1 refers to the first captured group which will be the first three repeated characters, so the extra \2+ gets stripped off.

Codepad

Paul
  • 139,544
  • 27
  • 275
  • 264
  • If I wanted to retain 4 repeated characters, would I change all the 2's to 3's and so on? I tried using 3,'s but that did not work. – Sammy May 25 '12 at 23:18
  • @Sammy No, `\2` refers to the digit captured by the second set of parentheses (the character that matched `[^\d]`. To retain 4 characters you would change `\2\2` to `\2\2\2`. That will keep one extra character, once you're at that many though it's better to change it to `\2{3}` which is a shorthand for `\2` three times. IE. `\2{5}` is shorthand for `\2\2\2\2\2` – Paul May 25 '12 at 23:20
  • Hm, this doesn't work. I use different regex which does the trick: preg_replace('{([^\w])\1+}','',$str); – besimple Jan 05 '15 at 15:32
0

The regex you are looking for: /((.)\2{2})\2*/ If you need exception n, put n-1 in the curly brace {n-1}: /((.)\2{n-1})\2*/

EDIT: for non-number or what ever you what, replace . with other things, for example [^\d] etc. /(([^\d])\2{2})\2*/

Rocco
  • 1,087
  • 12
  • 21