23

I am trying to get the length of this unicode characters string

$text = 'نام سلطان م';
$length = strlen($text);
echo $length;

output

20

How it determines the length of unicode characters string?

Munib
  • 3,533
  • 9
  • 29
  • 37

4 Answers4

47

strlen() is not handling multibyte characters correctly, as it assumes 1 char equals 1 byte, which is simply invalid for unicode. This behavior is clearly documented:

strlen() returns the number of bytes rather than the number of characters in a string.

The solution is to use mb_strlen() function instead (mb stands for multi byte) (see mb_strlen() docs).

Dharman
  • 30,962
  • 25
  • 85
  • 135
Marcin Orlowski
  • 72,056
  • 11
  • 123
  • 141
  • 1
    I wonder, what is the specific point for separating unicode/non-unicode functions? Why not always use `(mb_)` functions? – Ilia Ross Dec 25 '14 at 09:06
  • 2
    Shortly - it's because PHP design (as language) sucks in many places and this includes UTF8 support too. PHP was not supporting multibyte encodings internally for ages, and that's why multibyte extension was created. You can have `mb_` used automatically by PHP using function overloading -> see http://php.net/manual/en/mbstring.overload.php but that's depends on PHP config so it sometimes may be better to directly use `mb_` if you cannot ensure it will be used other way. – Marcin Orlowski Dec 25 '14 at 09:51
  • Thanks for explanations, my friend, and especially for pointing at `overload`, missed that completely.. Cheers! – Ilia Ross Dec 26 '14 at 14:51
  • 1
    Well for some reason mb_strlen($text) didn't work on my system directly. You may like to specify the encoding type: $len=mb_strlen($text,'UTF-8'); To be on the safe side. – tormuto Oct 13 '16 at 10:08
  • You may also want to edit your `php.ini` and set it up there. See: http://php.net/manual/en/mbstring.configuration.php – Marcin Orlowski Oct 14 '16 at 20:50
  • This difference between `strlen` and `mb_strlen` cost us tremendous error in a financial system we were programming. We found out we were overcharging some customers who were sending sms messages with unicode characters when we used `strlen`. Be careful guys, don't fall into the same trap we did. – Moses Ndeda Jan 14 '18 at 13:47
  • Unfortunately it also doesn't work with `sprintf` and padding. `mb_vsprintf` and `mb_sprintf` don't exists (?). – KumZ Apr 13 '21 at 21:44
  • @MarcinOrlowski I removed the second "part" of the answer. It made some sense in revision 2 but after that you should have removed it completely. It was only confusing readers. `mb_strlen` is not deprecated. Only the overload configuration was removed. Your answer is suggesting to use `mb_strlen` and the information about the overload was completely unnecessary. If you feel that we removed too much, please improve the answer later, but without adding confusing EDIT statements. The answer should represent the current state as of this year. – Dharman May 15 '23 at 14:29
5

You are looking for mb_strlen.

Jon
  • 428,835
  • 81
  • 738
  • 806
3

Function strlnen does not count the number of characters, but the number of bytes. For multibyte characters it will return higher numbers.
Use mb_strlen() instead to count the actual count of characters.

Tomáš Zato
  • 50,171
  • 52
  • 268
  • 778
0

Just as an addendum to the other answers that reference mb_strlen():

If the php.in setting mbstring.func_overload has bit 2 set to 1, then strlen will count characters based on the default charset; otherwise it will count the number of bytes in the string

Ariel
  • 25,995
  • 5
  • 59
  • 69
Mark Baker
  • 209,507
  • 32
  • 346
  • 385