12

I'm looking for a multi-byte function to replace preg_match_all(). I need one that will give me an array of matched strings, like the $matches argument from preg_match(). The function mb_ereg_match() doesn't seem to do it -- it only gives me a boolean indicating if there were any matches.

Looking at the mb_* functions page, I don't offhand see anythng that replaces the functionality of preg_match(). What do I use?

Edit I'm an idiot. I originally posted this question asking for a replacement for preg_match, which of course is ereg_match. However both those only return the first result. What I wanted was a replacement for preg_match_all, which returns all match texts. But anyways, the u modifier works in my case for preg_match_all, as hakre pointed out.

user151841
  • 17,377
  • 29
  • 109
  • 171
  • http://stackoverflow.com/questions/1766485/are-the-php-preg-functions-multibyte-safe – Griwes Oct 06 '11 at 14:21
  • I note your say that `ereg_match()` is a replacement for `preg_match()`. Be aware that PHP's `ereg_` functions are deprecated, and should be avoided. – Spudley Oct 06 '11 at 16:15

2 Answers2

17

Have you taken a look into mb_ereg?

Additionally, you can pass an UTF-8 encoded string into preg_match using the u modifier, which might be the kind of multi-byte support you need. The other option is to encode into UTF-8 and then encode the results back.

See as well an answer to a related question: Are the PHP preg_functions multibyte safe?

hakre
  • 193,403
  • 52
  • 435
  • 836
  • Can you point me to some documentation on the `u` modifier? That's part of the regex? – user151841 Oct 06 '11 at 14:58
  • Actually it looks like the 4th answer down on that related question has some info about the `u` modifier. – user151841 Oct 06 '11 at 14:59
  • So I tried it out, and it only seems to return the first match :P Unless I'm doing it wrong. – user151841 Oct 06 '11 at 15:08
  • You should add your code to your question, so it's actually clear what you tried so far. Take care that the input string is UTF-8 encoded if you're using `preg_match` with the `u` modifier. Then I might be able to spot your error. – hakre Oct 06 '11 at 15:10
  • Sorry, what I meant was that `mb_ereg` returns only the first match string (apparently). – user151841 Oct 06 '11 at 15:14
  • I'm an idiot. I'm looking for a replacement for `preg_match_all`! :P – user151841 Oct 06 '11 at 15:30
  • LOL ;), okay. What is the encoding/charset of your string? I ask, because if you have this in UTF-8, you don't need any replacement. If not, you needs to create a replacement function on your own that consists of `mb_ereg...` functions, doing one match after the other. – hakre Oct 06 '11 at 15:33
  • 4
    The `u` modifier is the correct answer. Avoid the `ereg_` (and `mb_ereg_`) functions because they have been deprecated. – Spudley Oct 06 '11 at 16:16
  • @hakre I'm not looking to do replacement, but to pull multiple matches out of a large string. – user151841 Oct 06 '11 at 16:29
  • Find the next match after the offset of the last match + the length of the last match (both 0 at start). Loop until nothing is found any longer. Store matches inside an array. – hakre Oct 06 '11 at 16:34
3

PHP: preg_grep manual

$matches = preg_grep('/(needles|to|find)/u', $inputArray);

Returns an array indexed using the keys from the input array.

Note the /u modifier which enables multibyte support.

Hope it helps others.

MarcoP
  • 190
  • 1
  • 6