The preg_replace should not cut out words

Question

My code:

$slug = preg_replace('/[^a-z0-9]+/i', '-', trim(strtolower($_POST["title"])));

Example, when i write: Úm Titulo, I get: -m-titulo. As you can see, i missed the Ú. When i write: Úm Titulo. I should get um-titulo. How to fix it? The preg_replace should not cut out words with accentuation.

Wiktor Stribiżew · Answer 1 · 2018-10-01T19:13:38.010

0

The /[^a-z0-9]+/i pattern matches 1+ chars that are not ASCII letters and digits. It matches Я, ё, ł, ę and many more letters.

You may use

preg_replace('~[\W_]+~u', '-', $s)

See the regex demo.

Here, [\W_] matches any char that is not a Unicode letter or digit (\W does not match _, thus, _ is added to the character class).

The u modifier makes \W Unicode-aware.

To also change all Unicode letters to their base forms, normalize the string:

$result = strtolower( preg_replace('~[\W_]+~u', '-', normalizer_normalize($text, Normalizer::NFKC)) );

NOTE: Make sure php_intl.dll extension is enabled.

edited Oct 01 '18 at 19:13

answered Oct 01 '18 at 18:53

Wiktor Stribiżew

607,720
39
448
563

https://regex101.com/r/5jppdc/3 as you can see, the preg_repace only put "-" on the first word, and it don't replace the `Úm` with `um`. – Sabrina Oct 01 '18 at 19:02
@Sabrina If you want to also deaccent the string, you need to use the corresponding function. You won't be able to achieve that with regex only. – Wiktor Stribiżew Oct 01 '18 at 19:03
That's all i need to do? Another thing, how do i replace my code, with yours? Mine is: `$slug = preg_replace('/[^a-z0-9]+/i', '-', trim(strtolower($_POST["title"])));` – Sabrina Oct 01 '18 at 19:24
@Sabrina `$slug = preg_replace('/[\W_]+/u', '-', trim(strtolower(normalizer_normalize($_POST["title"], Normalizer::NFKC))));`. Also, [here are hints on enabling the normalization support](http://php.net/manual/en/intl.installation.php). – Wiktor Stribiżew Oct 01 '18 at 20:25
I put `Úm Titulo Muito` and it outputs `Úm-titulo-muito`, it should outputs `um-titulo-muito`... Did you forgot something? – Sabrina Oct 03 '18 at 20:41
@Sabrina I can't repro, the `normalizer_normalize($text, Normalizer::NFKC)` part removes accents, `preg_replace('~[\W_]+~u', '-', $res)` replaces all non-alnum char chunks with `-`, and `strtolower` should make it lowercase. – Wiktor Stribiżew Oct 03 '18 at 20:46

score 0 · Answer 2 · answered Oct 01 '18 at 18:58

0

I think the problem is your regex, as it does not account for letters with accent marks? Refer to the following:

Replacing Accented Characters

answered Oct 01 '18 at 18:58

The preg_replace should not cut out words

2 Answers2