29

PHP uses a copy-on-modification system.

Does $a = (string) $a; ($a is a already string) modify and copy anything?


Especially, this is my problem:

Parameter 1 is mixed / I want to allow to pass non-strings and convert them to strings.
But sometimes these strings are very large. So I want to omit copying of a param, that is already a string.

Can I use version Foo or do I have to use version Bar?

class Foo {
    private $_foo;
    public function __construct($foo) {
        $this->_foo = (string) $foo;
    }
}

class Bar {
    private $_bar;
    public function __construct($bar) {
        if (is_string($bar)) {
            $this->_bar = $bar;
        } else {
            $this->_bar = (string) $bar;
        }
    }
}
Community
  • 1
  • 1
mzimmer
  • 523
  • 2
  • 5
  • 11

3 Answers3

45

The answer is that yes, it does copy the string. Sort-of... Not really. Well, it depends on your definition of "copy"...

>= 5.4

To see what's happening, let's look at the source. The executor handles a variable cast in 5.5 here.

    zend_make_printable_zval(expr, &var_copy, &use_copy);
    if (use_copy) {
        ZVAL_COPY_VALUE(result, &var_copy);
        // if optimized out
    } else {
        ZVAL_COPY_VALUE(result, expr);
        // if optimized out
        zendi_zval_copy_ctor(*result);
    }

As you can see, the call uses zend_make_printable_zval() which just short-circuits if the zval is already a string.

So the code that's executed to do the copy is (the else branch):

ZVAL_COPY_VALUE(result, expr);

Now, let's look at the definition of ZVAL_COPY_VALUE:

#define ZVAL_COPY_VALUE(z, v)                   \
    do {                                        \
        (z)->value = (v)->value;                \
        Z_TYPE_P(z) = Z_TYPE_P(v);              \
    } while (0)

Note what that's doing. The string itself is NOT copied (which is stored in the ->value block of the zval). It's just referenced (the pointer remains the same, so the string value is the same, no copy). But it's creating a new variable (the zval part that wraps the value).

Now, we get into the zendi_zval_copy_ctor call. Which internally does some interesting things on its own. Note:

case IS_STRING:
    CHECK_ZVAL_STRING_REL(zvalue);
    if (!IS_INTERNED(zvalue->value.str.val)) {
        zvalue->value.str.val = (char *) estrndup_rel(zvalue->value.str.val, zvalue->value.str.len);
    }
    break;

Basically, that means that if it's an interned string, it won't be copied. but if it's not, it will be copied... So what's an interned string, and what does that mean?

<= 5.3

In 5.3, interned strings didn't exist. So the string is always copied. That's really the only difference...

Benchmark Time:

Well, in a case like this:

$a = "foo";
$b = (string) $a;

No copy of the string will happen in 5.4, but in 5.3 a copy will occur.

But in a case like this:

$a = str_repeat("a", 10);
$b = (string) $a;

A copy will occur for all versions. That's because in PHP, not all strings are interned...

Let's try it out in a benchmark: http://3v4l.org/HEelW

$a = "foobarbizbazbuztestingthisoutfoobarbizbazbuztestingthisoutfoobarbizbazbuztestingthisoutfoobarbizbazbuztestingthisoutfoobarbizbazbuztestingthisoutfoobarbizbazbuztestingthisoutfoobarbizbazbuztestingthisoutfoobarbizbazbuztestingthisoutfoobarbizbazbuztestingthisoutfoobarbizbazbuztestingthisoutfoobarbizbazbuztestingthisoutfoobarbizbazbuztestingthisoutfoobarbizbazbuztestingthisoutfoobarbizbazbuztestingthisoutfoobarbizbazbuztestingthisoutfoobarbizbazbuztestingthisoutfoobarbizbazbuztestingthisoutfoobarbizbazbuztestingthisoutfoobarbizbazbuztestingthisoutfoobarbizbazbuztestingthisoutfoobarbizbazbuztestingthisoutfoobarbizbazbuztestingthisoutfoobarbizbazbuztestingthisoutfoobarbizbazbuztestingthisoutfoobarbizbazbuztestingthisoutfoobarbizbazbuztestingthisoutfoobarbizbazbuztestingthisoutfoobarbizbazbuztestingthisoutfoobarbizbazbuztestingthisoutfoobarbizbazbuztestingthisout";
$b = str_repeat("a", 300);

echo "Static Var\n";
testCopy($a);
echo "Dynamic Var\n";
testCopy($b);

function testCopy($var) {
    echo memory_get_usage() . "\n";
    $var = (string) $var;
    echo memory_get_usage() . "\n";
}

Results:

  • 5.4 - 5.5 alpha 1 (not including other alphas, as the differences are minor enough to not make a fundamental difference)

    Static Var
    220152
    220200
    Dynamic Var
    220152
    220520
    

    So the static var increased by 48 bytes, and the dynamic var increased by 368 bytes.

  • 5.3.11 to 5.3.22:

    Static Var
    624472
    625408
    Dynamic Var
    624472
    624840
    

    The static var increased by 936 bytes while dynamic var increased by 368 bytes.

So notice that in 5.3, both the static and the dynamic variables were copied. So the string is always duplicated.

But in 5.4 with static strings, only the zval structure was copied. Meaning that the string itself, which was interned, remains the same and is not copied...

One Other Thing

Another thing to note is that all of the above is moot. You're passing the variable as a parameter to the function. Then you're casting inside the function. So copy-on-write will be triggered by your line. So running that will always (well, in 99.9% of cases) trigger a variable copy. So at best (interned strings) you're talking about a zval duplication and associated overhead. At worst, you're talking about a string duplication...

ircmaxell
  • 163,128
  • 34
  • 264
  • 314
  • 4
    I like knowing the internal how and why something works; not just that it does. +1 – David J Eddy Feb 28 '13 at 16:49
  • I had this feeling that it might be related to version number (hence I posted the version), glad you clarified all this. This is the correct (and very detailed!) answer. – Karoly Horvath Feb 28 '13 at 16:55
  • Awesome answer! The 'One Other Thing'-part concludes, what I was looking for. Thanks! – mzimmer Feb 28 '13 at 18:31
  • I get same memory usage `347992` for all four cases. Running PHP 7.0.0. Is there something wrong? (Your demo confims this) – revo Feb 12 '16 at 13:20
  • 1
    @revo PHP7 changes things a bit, since strings are first-class entities, and they are not copied unless altered. – ircmaxell Feb 12 '16 at 16:03
  • @ircmaxell Sorry, I need some clarifications on what you said. I'm not familiar with first class entities and couldn't find some good information about it. – revo Feb 23 '16 at 16:54
  • 1
    @revo it's a phrase, meaning they have a dedicated data structure and management system for them. Meaning they are a defined type, that is managed separately from the variables (unlike in 5.x, where it was tied to the variable) – ircmaxell Feb 23 '16 at 20:59
12

Your code doesn't actually do:

$a = (string)$a;

It's more like this, because of copy-on-write semantics apply when the string is passed as a function argument:

$b = (string)$a;

There's a pretty big difference between those two statements. The first won't have any memory impact, whereas the second does ... usually.

The following code does roughly what your code would do; some string is passed and you cast and assign it to another variable. It tracks increases in memory.

<?php

$x = 0;
$y = 0;

$x = memory_get_usage();

$s = str_repeat('c', 1200);

$y = memory_get_usage();

echo $y - $x, PHP_EOL;

$s1 = (string)$s;

$x = memory_get_usage();

echo $x - $y, PHP_EOL;

Results (5.4.9):

1360
1360

Results (5.3.19):

1368
1368

The assignment basically copies the whole string value.

Using string literals

When using a string literal, behaviour depends on the version:

<?php

$x = 0;
$y = 0;

$x = memory_get_usage();

$s = 'cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc';

$y = memory_get_usage();

echo $y - $x, PHP_EOL;

$s1 = (string)$s;

$x = memory_get_usage();

echo $x - $y, PHP_EOL;

Results (5.4.9):

152
136

Results (5.3.19):

1328
1328

The reason is that a string literals are treated differently by the engine, as you can read from ircmaxell's answer.

Community
  • 1
  • 1
Ja͢ck
  • 170,779
  • 38
  • 263
  • 309
  • hmm.. that's not what I see. Could you check my answer? – Karoly Horvath Feb 28 '13 at 16:00
  • 2
    @KarolyHorvath Well, ref counts are not the same thing as memory consumption though; I'm not sure if the refcount is related here. – Ja͢ck Feb 28 '13 at 16:01
  • this answer is **wrong**, and it's proof has a flaw.. you reused the same variable name... See my answer. – Karoly Horvath Feb 28 '13 at 16:25
  • @KarolyHorvath As I said earlier, refcount !== memory consumption; creating a new symbol to reference the same value doesn't count. – Ja͢ck Feb 28 '13 at 16:28
  • @KarolyHorvath The point is, that in your test-script, we create a new variable `$b = (string) $a`. But the question is about `$a = (string) $a`. With no new variable and $a already being a string no additional memory is allocated by php. – mzimmer Feb 28 '13 at 16:45
  • @KarolyHorvath Okay, using the type cast AND a new variable increases memory. Some internals thingy I suppose :) updated answer. – Ja͢ck Feb 28 '13 at 16:57
  • 2
    @MichelZimmer: but if you pass `$a` into a function and do this inside of the function, `$a = (string) $a` is the same as `$b = (string) $a` due to copy-on-write semantics. So yes, you are concerned with the second... – ircmaxell Feb 28 '13 at 17:42
  • @ircmaxell Yep, that's been incorporated at the top of my answer now :) – Ja͢ck Feb 28 '13 at 17:46
7

Surprisingly, it does create a copy:

$string = "TestMe";
debug_zval_dump($string);

$string2 = $string;
debug_zval_dump($string);

$string3 = $string;
debug_zval_dump($string);

$string4 = (string) $string;
debug_zval_dump($string);

$string5 = (string) $string;
debug_zval_dump($string);

Output:

string(6) "TestMe" refcount(2)
string(6) "TestMe" refcount(3)
string(6) "TestMe" refcount(4)
string(6) "TestMe" refcount(4)
string(6) "TestMe" refcount(4)

Another proof:

echo memory_get_usage(), PHP_EOL;

$s = str_repeat('c', 100000);
echo memory_get_usage(), PHP_EOL;

$s1 = $s;
echo memory_get_usage(), PHP_EOL;

$s2 = (string) $s;
echo memory_get_usage(), PHP_EOL;

Output:

627496
727664
727760  # small increase, new allocated object, but no string copy
827928  # oops, we copied the string...
Karoly Horvath
  • 94,607
  • 11
  • 117
  • 176
  • As mentioned in the other answers the internals changed from 5.3 to 5.4. Running the first code will now emit five times: _string(6) "TestMe" interned_. Same for the second code which will show no increasing memory usage. – lukas.j Apr 23 '23 at 12:07