69

I am doing a real estate feed for a portal and it is telling me the max length of a string should be 20,000 bytes (20kb), but I have never run across this before.

How can I measure byte size of a varchar string. So I can then do a while loop to trim it down.

Hash
  • 4,647
  • 5
  • 21
  • 39
Liam Bailey
  • 5,879
  • 3
  • 34
  • 46
  • there shouldn't be any problem getting a string to that length is there what is it telling you ? what errors are you seeing ???? – bigkm Sep 27 '11 at 12:30
  • byte size -> ```strlen()``` ex: ```strlen('a₹')``` -> ```4```. character count -> ```mb_strlen()``` ex: ```mb_strlen('a₹', "UTF-8")``` -> ```2```. Note: ```mb_strlen()``` is disabled by default in php. – Sathvik Jun 24 '21 at 20:58

5 Answers5

98

You can use mb_strlen() to get the byte length using a encoding that only have byte-characters, without worring about multibyte or singlebyte strings. For example, as drake127 saids in a comment of mb_strlen, you can use '8bit' encoding:

<?php
    $string = 'Cién cañones por banda';
    echo mb_strlen($string, '8bit');
?>

You can have problems using strlen function since php have an option to overload strlen to actually call mb_strlen. See more info about it in http://php.net/manual/en/mbstring.overload.php

For trim the string by byte length without split in middle of a multibyte character you can use:

mb_strcut(string $str, int $start [, int $length [, string $encoding ]] )
PhoneixS
  • 10,574
  • 6
  • 57
  • 73
31

You have to figure out if the string is ascii encoded or encoded with a multi-byte format.

In the former case, you can just use strlen.

In the latter case you need to find the number of bytes per character.

the strlen documentation gives an example of how to do it : http://www.php.net/manual/en/function.strlen.php#72274

Foo Bah
  • 25,660
  • 5
  • 55
  • 79
  • 13
    strlen is not mb-safe function and actually returns number of bytes, not of characters. If you want number of characters in multi-byte encoding, you have to use mb_strlen. – Maxim Krizhanovsky Sep 27 '11 at 12:43
  • 12
    @Darhazer it is possible to overload `str*()` into `mb_str*()`, so calling `strlen` will indeed call `mb_strlen`. To see if this is enabled, check `mbstring.func_overload` in php.ini. Also see http://php.net/manual/en/mbstring.overload.php – Carlos Campderrós Sep 27 '11 at 13:06
  • 8
    If you're looking for the number of **bytes** (which is what you asked for - *not* the number of characters) the correct answer was posted by @PhoneixS below; as pointed out by @Carlos `strlen()` isn't safe because it may be overloaded on some PHP installations. – mindplay.dk Jun 27 '14 at 11:24
  • 1
    @CarlosCampderrós Function overloading deprecated in PHP 7.2.0, removed in PHP 8.0.0. https://www.php.net/manual/en/mbstring.overload.php – Buttle Butkus Apr 06 '22 at 20:03
28

Do you mean byte size or string length?

Byte size is measured with strlen(), whereas string length is queried using mb_strlen(). You can use substr() to trim a string to X bytes (note that this will break the string if it has a multi-byte encoding - as pointed out by Darhazer in the comments) and mb_substr() to trim it to X characters in the encoding of the string.

soulmerge
  • 73,842
  • 19
  • 118
  • 155
  • 4
    strlen doesn't give you byte size. – N.B. Sep 27 '11 at 12:31
  • 9
    @N.B.it gives you exactly the number of bytes... that's why there is mb_strlen() in the mb_ extension. Try strlen on multi-byte character to test... – Maxim Krizhanovsky Sep 27 '11 at 12:45
  • @soulmerge if you use substr() on a multi-byte encoded string, you can break the last character in the string. – Maxim Krizhanovsky Sep 27 '11 at 12:46
  • @Darhazer: You are correct but that's not something I had suggested doing. It is not stated that the string is indeed a multibyte string, either. – soulmerge Sep 27 '11 at 13:12
  • 1
    @N.B. There is a difference between `mb_strlen()` and `strlen()`, i.e. they are not aliases of each other. I have furthermore done enough network programming in PHP to know that it does give the byte length. – soulmerge Sep 27 '11 at 13:15
  • 2
    @soulmerge as Carlos Campderrós said in other answer, it is possible to overload str*() into mb_str*(), so calling strlen will indeed call mb_strlen. To see if this is enabled, check mbstring.func_overload in php.ini. Also see http://php.net/manual/en/mbstring.overload.php – PhoneixS Mar 15 '12 at 10:50
  • 3
    There is now [a note on the PHP manual page for `strlen()`](http://php.net/manual/en/function.strlen.php#refsect1-function.strlen-notes): "strlen() returns the number of bytes rather than the number of characters in a string." Not sure if that was there before, but it confirms that this answer is correct. – J.D. Mar 24 '15 at 21:31
  • 2
    @PhoneixS Luckily, the function overloading "feature" has been removed as of PHP 8.0.0. Deprecated in 7.2.0. So you can now rely on `strlen` to return byte length of a string. – Buttle Butkus Apr 06 '22 at 20:00
5

PHP's strlen() function returns the number of ASCII characters.

strlen('borsc') -> 5 (bytes)

strlen('boršč') -> 7 (bytes)

$limit_in_kBytes = 20000;

$pointer = 0;
while(strlen($your_string) > (($pointer + 1) * $limit_in_kBytes)){
    $str_to_handle = substr($your_string, ($pointer * $limit_in_kBytes ), $limit_in_kBytes);
    // here you can handle (0 - n) parts of string
    $pointer++;
}

$str_to_handle = substr($your_string, ($pointer * $limit_in_kBytes), $limit_in_kBytes);
// here you can handle last part of string

.. or you can use a function like this:

function parseStrToArr($string, $limit_in_kBytes){
    $ret = array();

    $pointer = 0;
    while(strlen($string) > (($pointer + 1) * $limit_in_kBytes)){
        $ret[] = substr($string, ($pointer * $limit_in_kBytes ), $limit_in_kBytes);
        $pointer++;
    }

    $ret[] = substr($string, ($pointer * $limit_in_kBytes), $limit_in_kBytes);

    return $ret;
}

$arr = parseStrToArr($your_string, $limit_in_kBytes = 20000);
Ivar
  • 6,138
  • 12
  • 49
  • 61
mIFO
  • 51
  • 1
  • 3
4

Further to PhoneixS answer to get the correct length of string in bytes - Since mb_strlen() is slower than strlen(), for the best performance one can check "mbstring.func_overload" ini setting so that mb_strlen() is used only when it is really required:

$content_length = ini_get('mbstring.func_overload') ? mb_strlen($content , '8bit') : strlen($content);
chiwangc
  • 3,566
  • 16
  • 26
  • 32
Ulver
  • 905
  • 8
  • 13
  • 1
    Thankfully, this check is no longer needed as of PHP 8.0.0. The function overloading "feature" has been removed as of PHP 8.0.0, and deprecated in 7.2.0. So you can now rely on `strlen` to return byte length of a string. – Buttle Butkus Apr 06 '22 at 20:02