4

Hi i'm actually trying replacing all the NON-alphanumeric chars from a string like this:

mb_ereg_replace('/[^a-z0-9\s]+/i','-',$string);

first problem is it doesn't replaces chars like "." from the string.

Second i would like to add multybite support for all users languages to this method.

How can i do that?

Any help appriciated, thanks a lot.

itsme
  • 48,972
  • 96
  • 224
  • 345
  • Just add the period to your character list. ^a-z0-9\s. ---As for "adding multibyte support", if you mean replacing accents and such, I know no other method than making a huge array with things like é => e, and use it for strtr. – Ariane Jun 14 '13 at 19:31
  • @Ariane it's ok so? i mean i already got period !? no doesn't matter for accented ;) – itsme Jun 14 '13 at 19:32

4 Answers4

11

Try the following:

preg_replace('/[^\p{L}0-9\s]+/u', '-', $string);

When the u flag is used on a regular expression, \p{L} (and \p{Letter}) matches any character in any of the Unicode letter categories.

Andrew Clark
  • 202,379
  • 35
  • 273
  • 306
1

It should replace . with -, you're probably mixing up your data in the first place.

As for the multi-byte support, add the u modifier and look into PCRE properties, namely \p{Letter}:

$replaced = preg_replace('~[^0-9\p{Letter}]+~iu', '-', $string);
Alix Axel
  • 151,645
  • 95
  • 393
  • 500
  • omg dude i used mb_ereg_replace() sorry i updated question @Alix Axel – itsme Jun 14 '13 at 19:46
  • @badbetonbreakbutbedbackbone: Perl Compatible Regular Expressions (`preg`) are more powerful than Extended Regular Expressions (`ereg`). – Alix Axel Jun 15 '13 at 02:20
1

The shortest way is:

$result = preg_replace('~\P{Xan}++~u', '-', $string);

\p{Xan} contains numbers and letters in all languages, thus \P{Xan} contains all that is not a letter or a number.

Casimir et Hippolyte
  • 88,009
  • 5
  • 94
  • 125
  • thanks but what about mb_ereg_replace() is it not better? can you tell me if i need to switch to preg_replace() and why does? :P – itsme Jun 14 '13 at 20:04
  • 1
    @badbetonbreakbutbedbackbone: No. The u modifier at the end make that the pattern can deal with unicode with preg_* functions. IMHO, the mb_ereg functions will disappear. preg_* functions are faster and support unicode. – Casimir et Hippolyte Jun 14 '13 at 20:13
0

This expression does replace dots. For multibyte use u modifier (UTF-8).

Ziarno
  • 7,366
  • 5
  • 34
  • 40