5

I need to remove all characters in a string except dashes, letters, numbers, spaces, and underscores.

Various answers on SO come tantalizingly close (Replace all characters except letters, numbers, spaces and underscores, Remove all characters except letters, spaces and apostrophes, etc.) but generally don't include dashes.

Help would be greatly appreciated.

Community
  • 1
  • 1
Ben Shoval
  • 1,732
  • 1
  • 15
  • 20
  • You accepted an answer that will remove whitespace. What you need is `preg_replace('/[^\s\w-]/', '', $old);` and if you work with Unicode, `'/[^\s\w\p{M}\p{Pd}]/u'` (where `\p{Pd}` is any dash). – Wiktor Stribiżew Apr 14 '17 at 06:51
  • @WiktorStribiżew I've been using it for several hours now and it works fine with spaces. – Ben Shoval Apr 14 '17 at 08:06
  • Ok, but the question mismatches the answer. – Wiktor Stribiżew Apr 14 '17 at 08:07
  • @WiktorStribiżew It deals with spaces just fine. Spaces an integral part of the issue I'm working with and the accepted answer works with spaces. – Ben Shoval Apr 14 '17 at 08:08
  • It [removes them](https://ideone.com/IjwMlF). You wrote *I need to **remove all** characters in a string **except** dashes, letters, numbers, **spaces**, and underscores.* – Wiktor Stribiżew Apr 14 '17 at 08:09
  • @WiktorStribiżew After further testing, you are correct. I'm not sure why but the text the time I'm working with which includes spaces (decoded from %27 or +) as part of get calls does not lose its spaces with Pedro's solution. But when I try it with a normal string, the space go away. Thank you for the heads up. If you have an explanation of the results I'm getting with my command line strings, I'd appreciate it. – Ben Shoval Apr 14 '17 at 08:19
  • If you post your *relevant* PHP code, I could help, but right now, your question looks off-topic to me. – Wiktor Stribiżew Apr 14 '17 at 09:28
  • I changed the accepted answer to cosmoonot's. That code deals with the spaces correctly. Thanks everyone for the feedback. – Ben Shoval Apr 15 '17 at 14:59

2 Answers2

9

You could do something like below:

    $string = ';")<br>kk23how nowbrowncow_-asdjhajsdhasdk32423ASDASD*%$@#!^ASDASDSA4sadfasd_-?!'; 
    $new_string = preg_replace('/[^ \w-]/', '', $string);
    echo $new_string;
bobble bubble
  • 16,888
  • 3
  • 27
  • 46
cosmoonot
  • 2,161
  • 3
  • 32
  • 38
  • 1
    Did a little edit, please check. How about attach the `+` quantifier to class for one or more. – bobble bubble Apr 14 '17 at 10:18
  • 1
    How can I add another character not to match? Besides "-" I want to exclude also "." – Xiaolin Wu Aug 02 '21 at 01:31
  • `\w` has important comment - `\w stands for “word character”. It always matches the ASCII characters [A-Za-z0-9_]. Notice the inclusion of the underscore and digits. In most flavors that support Unicode, \w includes many characters from other scripts. There is a lot of inconsistency about which characters are actually included.......` – jave.web Aug 15 '23 at 14:02
6

You probably need something like:

$new = preg_replace('/[^ \w-]/', '', $old);

Explanation:

[^ \w-]

Match any single character NOT present in the list below «[^ \w-]»
   The literal character “ ” « »
   A “word character” (Unicode; any letter or ideograph, any number, underscore) «\w»
   The literal character “-” «-»

Demo

Pedro Lobito
  • 94,083
  • 31
  • 258
  • 268