PHP - replace all non-alphanumeric chars for all languages supported

Question

Hi i'm actually trying replacing all the NON-alphanumeric chars from a string like this:

mb_ereg_replace('/[^a-z0-9\s]+/i','-',$string);

first problem is it doesn't replaces chars like "." from the string.

Second i would like to add multybite support for all users languages to this method.

How can i do that?

Any help appriciated, thanks a lot.

Just add the period to your character list. ^a-z0-9\s. ---As for "adding multibyte support", if you mean replacing accents and such, I know no other method than making a huge array with things like é => e, and use it for strtr. — Ariane, Jun 14 '13 at 19:31
@Ariane it's ok so? i mean i already got period !? no doesn't matter for accented ;) — itsme, Jun 14 '13 at 19:32

score 11 · Accepted Answer · answered Jun 14 '13 at 19:33

11

Try the following:

preg_replace('/[^\p{L}0-9\s]+/u', '-', $string);

When the u flag is used on a regular expression, \p{L} (and \p{Letter}) matches any character in any of the Unicode letter categories.

answered Jun 14 '13 at 19:33

Andrew Clark

omg dude i used mb_ereg_replace() sorry i updated question @F.J – itsme Jun 14 '13 at 19:47
should i use preg_replace() instead? :P – itsme Jun 14 '13 at 19:53
1

Sorry I don't know much about `mb_ereg_replace()`, but `preg_replace()` should work so switching is definitely an option. – Andrew Clark Jun 14 '13 at 20:13

Alix Axel · Answer 2 · 2013-06-14T19:34:14.740

1

It should replace . with -, you're probably mixing up your data in the first place.

As for the multi-byte support, add the u modifier and look into PCRE properties, namely \p{Letter}:

$replaced = preg_replace('~[^0-9\p{Letter}]+~iu', '-', $string);

edited Jun 14 '13 at 19:34

answered Jun 14 '13 at 19:28

Alix Axel

omg dude i used mb_ereg_replace() sorry i updated question @Alix Axel – itsme Jun 14 '13 at 19:46
@badbetonbreakbutbedbackbone: Perl Compatible Regular Expressions (`preg`) are more powerful than Extended Regular Expressions (`ereg`). – Alix Axel Jun 15 '13 at 02:20

Casimir et Hippolyte · Answer 3 · 2013-06-14T20:15:50.913

1

The shortest way is:

$result = preg_replace('~\P{Xan}++~u', '-', $string);

\p{Xan} contains numbers and letters in all languages, thus \P{Xan} contains all that is not a letter or a number.

edited Jun 14 '13 at 20:15

answered Jun 14 '13 at 20:01

thanks but what about mb_ereg_replace() is it not better? can you tell me if i need to switch to preg_replace() and why does? :P – itsme Jun 14 '13 at 20:04
1

@badbetonbreakbutbedbackbone: No. The u modifier at the end make that the pattern can deal with unicode with preg_* functions. IMHO, the mb_ereg functions will disappear. preg_* functions are faster and support unicode. – Casimir et Hippolyte Jun 14 '13 at 20:13

score 0 · Answer 4 · answered Jun 14 '13 at 19:33

0

This expression does replace dots. For multibyte use u modifier (UTF-8).

answered Jun 14 '13 at 19:33

Ziarno

omg dude i used mb_ereg_replace() sorry i updated question @Ziarno – itsme Jun 14 '13 at 19:48

4 Answers4