12

strlen($str) is returning negative values for a "huge string" that is created using str_repeat:

<?php

error_reporting(E_STRICT|E_ALL);
echo phpversion(); // 5.3.26
echo PHP_INT_MAX; // 9223372036854775807
ini_set('memory_limit', '-1');
ini_set('max_execution_time', 0);

$gb = 1024 * 1024 * 1024;
$str = str_repeat('a', 2 * $gb);
echo strlen($str); // gives int(-2147483648)
echo $str[0]; // Notice: Uninitialized string offset: 0

$str2 = str_repeat('a', 4 * $gb);
echo strlen($str2); // gives int(0)

$str3 = str_repeat('a', 123 + 4 * $gb);
echo strlen($str3); // gives int(123)

$str4 = str_repeat('a', 6 * $gb); // starts to wrap again...
echo strlen($str4); // gives int(-2147483648)
echo $str4[0]; // Notice: Uninitialized string offset: 0

$str5 = str_repeat('a', 123 + 8 * $gb);
echo strlen($str5); // gives int(123)

?>

Is this behavior defined?

Or is this a PHP bug?

Pacerier
  • 86,231
  • 106
  • 366
  • 634
  • Looks like you ran into the maximum integer value on 32bit PHP. A 64bit server would report the accurate length. – ceejayoz Jul 10 '13 at 16:26
  • 3
    As strlen returns an integer value, and all integers in PHP are signed integers, and 32-bit PHP uses 32-bit signed integers, then it is implicitly defined. The constant PHP_INT_MAX gives you the upper limit, whereupon the value "wraps" and goes negative – Mark Baker Jul 10 '13 at 16:26
  • My PHP_INT_MAX reports 9223372036854775807... – Pacerier Jul 10 '13 at 16:27
  • 1
    Have a look here http://stackoverflow.com/questions/3189040/what-is-the-maximum-length-of-a-string-in-php (and note this: `When PHP.net states "Note string can be as large as 2GB." php.net/manual/en/language.types.string.php do they mean it can go over 2GB? – Pacerier Jun 6 at 14:49` – fvu Jul 10 '13 at 16:30
  • I see this behavior on 64 bit PHP. 1024*1024*1024 = 1073741824, 107374182*2 = 2147483648 but `strlen()` reports that same value with the sign flipped negative. – Michael Berkowski Jul 10 '13 at 16:30
  • @fvu, yes *I'm* testing that behavior right now. – Pacerier Jul 10 '13 at 16:30
  • 1
    I'm getting same output as OP on 64 bit PHP. `PHP_INT_SIZE` === 8. – Dogbert Jul 10 '13 at 16:31
  • What if you use something other than var_dump? Or var_dump a comparison to zero? – jcsanyi Jul 10 '13 at 16:34
  • 1
    @jcsanyi Doing `echo strlen($str)` will report the same negative value as var_dump() here... – Michael Berkowski Jul 10 '13 at 16:34
  • What happens if you do one minus 2GB? `$str = str_repeat('a', 2 * $gb - 1);` Currently, you have the 32nd bit set, `0b10000000000000000000000000000000`, one minus clears that top-most bit to give you `0b01111111111111111111111111111111`, also known as 2147483647. If you get the correct value out (2147483647), then my guess would be they're using signed integers instead of unsigned for the length of the string. – nickb Jul 10 '13 at 16:39
  • @nickb, `2 * $gb - 1` works as expected (thus we get length 2147483647, as documented in PHP manual). The problem is shouldn't strlen report the length as zero instead of negative? – Pacerier Jul 10 '13 at 16:45
  • Documented return of 0 for strlen() is only for empty strings: there is no specifically documented behaviour for strings that exceed signed 32-bit integer length; but expected behaviour would be the same as adding to "wrap" from positive to negative: the puzzle (for me) is why it's using 32-bit ints instead of 64-bit – Mark Baker Jul 10 '13 at 16:51

2 Answers2

5

string can be as large as 2GB.

It looks like it is in fact (2GB - 1). This works fine on my x64 box:

$str = str_repeat('a', 2 * 1024 * 1024 * 1024 -1);
echo $str[0];

... while this breaks:

$str = str_repeat('a', 2 * 1024 * 1024 * 1024);
echo $str[0];

What you are doing is simply undefined, and the manual should be corrected. I would have expected a warning too.

Interestingly, this raises a fatal error:

$str = str_repeat('a', 2 * 1024 * 1024 * 1024 -2); // 2GB - 2 bytes
$str .= 'b'; // ok
$str .= 'c'; // PHP Fatal error:  String size overflow


Update:

The bug report has been attended to. Documentation on php.net has been fixed and now writes "2147483647 bytes maximum".

Pacerier
  • 86,231
  • 106
  • 366
  • 634
RandomSeed
  • 29,301
  • 6
  • 52
  • 87
  • 1
    I followed your suggestion and submitted the bug report: https://bugs.php.net/bug.php?id=65239 – Pacerier Jul 10 '13 at 17:57
  • Yea for some reason even though `echo $str[0];` doesn't work, the string still takes memory and we can operate on it (using trim etc). – Pacerier Jul 10 '13 at 17:59
  • @Pacerier You may want to add the test case with the string concatenation to your bug report, it seems to support the idea that this is a documentation bug. Besides `str_repeat` should probably throw the same fatal error as when concatenating. – RandomSeed Jul 10 '13 at 18:07
  • 1
    at least `str_pad` does exit with error. I think it's more than a documentation bug, it's essentially a `str_repeat` bug. I don't think there's another function that allows us to create strings of length > 2147483647 is there? – Pacerier Jul 10 '13 at 18:10
2

I suppose you're simply overflowing an int with your large string. From a manual:

The size of an integer is platform-dependent, although a maximum value of about two billion is the usual value (that's 32 bits signed). PHP does not support unsigned integers. Integer size can be determined using the constant PHP_INT_SIZE, and maximum value using the constant PHP_INT_MAX since PHP 4.4.0 and PHP 5.0.5.

So it should be OK if your string size can fit into int.

Jk1
  • 11,233
  • 9
  • 54
  • 64
  • 1
    There's no integer overflow, my PHP_INT_MAX reports 9223372036854775807 – Pacerier Jul 10 '13 at 16:34
  • 1
    Read the [manual for strings](http://php.net/language.types.string). It doesn’t matter what the maximum integer value is if strings have a different limit. – Gumbo Jul 10 '13 at 17:03
  • @Gumbo, strlen() returns an int. So you're bounded to int limit in what you can return from strlen. It doesn't matter what limit strings have. – Jk1 Jul 10 '13 at 17:22
  • @Jk1 [PHP uses `int` internally](http://lxr.php.net/xref/PHP_5_4/Zend/zend_builtin_functions.c#478) which can only represent values up to 2^31-1. – Gumbo Jul 10 '13 at 17:42
  • @Gumbo, that's the implementation of zend server, but it does not mean all PHP implementations need follow that right? – Pacerier Jul 10 '13 at 18:02
  • @Pacerier The Zend engine is the platform PHP is build on. – Gumbo Jul 10 '13 at 19:48
  • @Gumbo, What about HHVM? – Pacerier Mar 07 '15 at 23:54