25

How can I use PHP to strip out all characters that are NOT letters, numbers, spaces, or punctuation marks?

I've tried the following, but it strips punctuation.

preg_replace("/[^a-zA-Z0-9\s]/", "", $str);
mickmackusa
  • 43,625
  • 12
  • 83
  • 136
Tedd
  • 253
  • 1
  • 3
  • 5

4 Answers4

32
preg_replace("/[^a-zA-Z0-9\s\p{P}]/", "", $str);

Example:

php > echo preg_replace("/[^a-zA-Z0-9\s\p{P}]/", "", "⟺f✆oo☃. ba⟗r!");
foo. bar!

\p{P} matches all Unicode punctuation characters (see Unicode character properties). If you only want to allow specific punctuation, simply add them to the negated character class. E.g:

preg_replace("/[^a-zA-Z0-9\s.?!]/", "", $str);
Matthew Flaschen
  • 278,309
  • 50
  • 514
  • 539
  • The second would. The first allows all punctuation. – Matthew Flaschen Jun 16 '10 at 02:36
  • These seem to strip ALL characters :( – Tedd Jun 16 '10 at 02:42
  • I'm using your first example and this seem to strip all characters. What am I doing wrong? – Tedd Jun 16 '10 at 02:45
  • @Tedd, not sure. I posted a tested example. The [docs](http://www.php.net/manual/en/regexp.reference.unicode.php) mention a couple caveats. You have to use PHP after 4.4 or 5.1 (depending on branch), and UTF-8, and the PCRE library has to be compiled with `--enable-unicode-properties` – Matthew Flaschen Jun 16 '10 at 16:00
3

You're going to have to list the punctuation explicitly as there is no shorthand for that (eg \s is shorthand for white space characters).

preg_replace('/[^a-zA-Z0-9\s\-=+\|!@#$%^&*()`~\[\]{};:\'",<.>\/?]/', '', $str);
cletus
  • 616,129
  • 168
  • 910
  • 942
0

Let's build a multibyte-safe/unicode-safe pattern for this task.

From https://www.regular-expressions.info/unicode.html:

Code: (Demo)

echo preg_replace('/[^\p{L}\p{Z}\p{N}\p{P}]+/u', '', $string);
mickmackusa
  • 43,625
  • 12
  • 83
  • 136
0
$str = trim($str);
$str = trim($str, "\x00..\x1F");
$str = str_replace(array( "&quot;","&#039;","&amp;","&lt;","&gt;"),' ',$str);
$str = preg_replace('/[^0-9a-zA-Z-]/', ' ', $str);
$str = preg_replace('/\s\s+/', ' ', $str); 
$str = trim($str);
$str = preg_replace('/[ ]/', '-', $str);

Hope this helps.

Yuck
  • 49,664
  • 13
  • 105
  • 135
MojganK
  • 211
  • 3
  • 7
  • This answer is missing its educational explanation. It appears to be implementing a different set of rules (different from what the question asks for). – mickmackusa Nov 26 '22 at 11:57