14

I'm pretty new to PHP, and I noticed there are many different ways of handling regular expressions.

This is what I'm currently using:

$replace = array(" ",".",",","'","@");
$newString = str_replace($replace,"_",$join);

$join = "the original string i'm parsing through";

I want to remove everything which isn't a-z, A-Z, or 0-9. I'm looking for a reverse function of the above. A pseudocode way to write it would be

If characters in $join are not equal to a-z,A-Z,0-9 then change characters in $join to "_"

hakre
  • 193,403
  • 52
  • 435
  • 836
Ben McRae
  • 3,551
  • 13
  • 36
  • 31

3 Answers3

46
$newString = preg_replace('/[^a-z0-9]/i', '_', $join);

This should do the trick.

runfalk
  • 1,996
  • 1
  • 17
  • 20
  • 2
    Hi antennen, thanks for the reply! is this case sensitive, will it except capitals? Thanks, Ben. – Ben McRae Apr 13 '09 at 20:37
  • 2
    That's what the 'i' at the end is for - case insensitive. – ceejayoz Apr 13 '09 at 20:39
  • note that this regex will replace consecutive occurrences of non-alphanumeric characters with a single _. Thus '@@@' would be replaced with '_' not '___'. Remove the + if you don't want this behavior. – mpen Apr 13 '09 at 20:43
  • Good thing you pointed that out, I normally throw away characters using the same method. The plus is just old habit. Edited since it didn't replicate OP's stated behavoir. – runfalk Apr 13 '09 at 20:47
  • Thanks mark, The addition character is actually quite useful for what i am trying to achieve :) – Ben McRae Apr 13 '09 at 20:48
  • Thanks antennen, mark! if i wanted to allow certain characters along with the a-z0-9 eg. a backwards slash, how would i do this? sorry to ask a question in the comments section. – Ben McRae Apr 13 '09 at 20:51
  • A backward slash is a bit special. IIRC it'd be '/[^a-z0-9\\\\\]/i' – runfalk Apr 13 '09 at 20:56
13

The regular expression for anything which isn't a-z, A-Z, 0-9 is:

preg_replace('/[^a-zA-Z0-9]/', "_", $join);

This is known as a Negated Character Class

Gavin Miller
  • 43,168
  • 21
  • 122
  • 188
9

The easiest way is this:

preg_replace('/\W/', '_', $join);

\W is the non-word character group. A word character is a-z, A-Z, 0-9, and _. \W matches everything not previously mentioned*.

Edit: preg uses Perl's regular expressions, documented in the perlman perlre document.

*Edit 2: This assumes a C or one of the English locales. Other locales may have accented letters in the word character class. The Unicode locales will only consider characters below code point 128 to be characters.

Powerlord
  • 87,612
  • 17
  • 125
  • 175