4

I've been searching for UTF8-safe alternatives for string manipulation functions. I've found many different opinions and suggestions. I would like to ask if following functions can cause problems in UTF-8 and if does, what should I use instead. I know the list of mb_ prefixed functions in PHP manual, but there are not all functions I am using.

Functions are: implode, explode, str_replace, preg_match, preg_replace

Thank you

sczdavos
  • 2,035
  • 11
  • 37
  • 71
  • 1
    `preg_` family of functions work fine with unicode, but you'll need to specify in the parameters that you're using unicodes in the string. – Aleks G Aug 24 '12 at 14:45
  • @AleksG thanks for responding. Do you know also what with implode, explode and str_replace? – sczdavos Aug 24 '12 at 14:47
  • Those are UTF-8 safe, provided everything is valid UTF-8. No valid UTF-8 byte sequence is a sub-bytestring of some other UTF-8 byte sequence. – Esailija Aug 24 '12 at 14:50
  • http://stackoverflow.com/questions/21652261/str-word-count-alternative-for-utf8 – trante Feb 09 '14 at 13:35

2 Answers2

6

explode just looks for an identical byte sequence and separates the string at that point. Since UTF-8 is safely backwards compatible with ASCII, there's no concern and it will work fine. implode just assembles strings together, which works fine as well due to the properties of UTF-8. str_replace works for the same reasons. The preg_ functions work fine as long as you are using the /u modifier.

deceze
  • 510,633
  • 85
  • 743
  • 889
1

If you need to safely manipulate with UTF8 characters, you can do it like this:

mb_internal_encoding('UTF-8');
preg_replace( '`...`u', '...', $string ) // with the u (unicode) modifier
Peon
  • 7,902
  • 7
  • 59
  • 100
  • 3
    `mb_internal_encoding` is only useful for `mb_` functions. It has nothing to do with the `preg_` functions. – deceze Aug 24 '12 at 15:08