0

I need to remove punctuations except parentheses from the string. I have come up with following:

$clean = preg_replace ( "/[^\.\,\-\_\'\"\@\?\!\:\$ a-zA-Z0-9()]/", "", $maybedirty );

That seemed to work OK, until I realized that I need to let through some utf-8 encoded characters (East European). Although I found a number of suggestions of possible solutions, I so far failed to make them work (or to understand them, or both). So question is how can I modify the regex to allow for utf-8 encoded characters.

1 Answers1

0
$clean = preg_replace('/[^\w\s()]/', '', $maybedirty);

Regex Explanation:

[^\w\s()]

Match any single character NOT present in the list below «[^\w\s()]»
   A “word character” (Unicode; any letter or ideograph, any number, underscore) «\w»
   A “whitespace character” (any Unicode separator, tab, line feed, carriage return, vertical tab, form feed, next line) «\s»
   A single character from the list “()” «()»
Pedro Lobito
  • 94,083
  • 31
  • 258
  • 268
  • 1
    What Pedro Lobito suggested did not work at first attempt. But explanation which accompanied gave me enough pointers what to look for and finally I found the /u (intended for UTF-8 chars. so the final solution look like that: `$clean = preg_replace('/[^\w\s()]/u', '', $maybedirty);` This works, except it still passes through the underscore, which should be removed. – user3626099 Dec 26 '16 at 15:15
  • I'm glad it worked out for you! – Pedro Lobito Dec 26 '16 at 18:50