17

I came across this seemingly very simple question the other day How to changing value in $array2 without referring $array1? However the more i looked into it the more odd it seemed that this is indeed functioning as intended. After this I started looking into the opcodes that are generated from the output of the following.

$array1 = array(2, 10);
$x = &$array1[1];
$array2 = $array1;
$array2[1] = 22;

echo $array1[1]; // Outputs 22

This seems crazy to me since array2 should only be a copy of array1 and anything that happens to one array should not effect the contents of the other. Of course if you comment out the second line the final line will echo out 10 like expected.

Looking farther I could a cool site that shows me the opcodes that PHP produces using the Vulcan Logic Dumper. Here is the opcodes generated by the above code.

Finding entry points
Branch analysis from position: 0
Return found
filename:       /in/qO86H
function name:  (null)
number of ops:  11
compiled vars:  !0 = $array1, !1 = $x, !2 = $array2
line     # *  op                           fetch          ext  return  operands
---------------------------------------------------------------------------------
   3     0  >   INIT_ARRAY                                       ~0      2
         1      ADD_ARRAY_ELEMENT                                ~0      10
         2      ASSIGN                                                   !0, ~0
   4     3      FETCH_DIM_W                                      $2      !0, 1
         4      ASSIGN_REF                                               !1, $2
   5     5      ASSIGN                                                   !2, !0
   6     6      ASSIGN_DIM                                               !2, 1
         7      OP_DATA                                                  22, $6
   8     8      FETCH_DIM_R                                      $7      !0, 1
         9      ECHO                                                     $7
        10    > RETURN                                                   1

These opcodes aren't documented great here http://php.net/manual/en/internals2.opcodes.php but I believe in English the opcodes are doing the following. By line... might be more for me than anyone else.

  1. Line 3: We initialize the array with it's first value and then add 10 to it before assigning it to $array1.
  2. Line 4: Get a write-only? value from the array and assign it by reference to $x.
  3. Line 5: Set $array1 to $array2.
  4. Line 6: Get array index of 1. od_data I am guessing sets it to 22 although $6 is never returned. OD_DATA has absolutely no documentation and is not listed as an opcode anywhere I have looked.
  5. Line 8: Fetch a read only value from index 1 of $array1 and echo it out.

Even working through the opcodes I am not sure where this is going wrong. I have a feeling the lack of documentation on the opcodes and my inexperience with working with them is likely keeping me from figuring out where this is going wrong.

EDIT 1:

As pointed out by Mike in the first comment arrays reference status is preserved when they are copied. Here can be seen documentation along with a place in the array article it links to http://php.net/manual/en/language.types.array.php#104064. This funny enough is not considered a warning. What is surprising to me if this is true the reference status is not preserved for this code as you would expect.

$array1 = array(2, 10);
$x = &$array1;
$array2 = $array1;
$array2[1] = 22;

echo $array1[1]; // Output is 10

So it seems this only happens when you try and assign single elements by reference making this functionality even more confusing.

Why does php only preserve the status of the arrays indexes when they are individually assigned?

EDIT 2:

I did some testing using HHVM today and HHVM handles the first snip-it of code how you think it would. I love PHP but HHVM is looking better and better over the Zend Engine.

Community
  • 1
  • 1
michael.schuett
  • 4,248
  • 4
  • 28
  • 39
  • 4
    See http://php.net/manual/en/language.types.array.php#104064 for why. The "shared" data stays shared. – Mike 'Pomax' Kamermans Sep 28 '14 at 02:49
  • 2
    That seems insane to me as you are actually changing the arrays structure when you assign even parts of it by reference. Plus if you remove the [1] from line 2 the data shared data does not stay the same so this is not entirely true. – michael.schuett Sep 28 '14 at 02:55
  • 1
    the initial assignment is just an alias. It's not once you start manipulating them with independent operations like `...[]` that they're forced to diverge. It's certainly "unexpected", but I wouldn't go so far as to call it "insane". Though I'm happily of the opinion it wasn't a very good choice when it was made, and PHP's been stuck with it ever since. – Mike 'Pomax' Kamermans Sep 28 '14 at 03:28
  • 1
    @Mike'Pomax'Kamermans Thanks for taking the time to explain it to me. I have been looking over optcodes and different variations of this code for the last 3 hours before posting it. If you want to post an answer for this I will accept that as it seems a few other people are also interested in this question. – michael.schuett Sep 28 '14 at 03:34
  • posted, and expanded a little. – Mike 'Pomax' Kamermans Sep 28 '14 at 03:44

1 Answers1

10

This is explained over at the PHP manual (even if you have to spend more time than you should have to in order to find it), specifically over at http://php.net/manual/en/language.types.array.php#104064

The "shared" data stays shared, with the initial assignment just acting as an alias. It's not until you start manipulating the arrays with independent operations like ...[] = ... that the intepreter starts to treat them as divergent lists, and even then the shared data stays shared so you can have two arrays with a shared first n elements but divergent subsequent data.

For a true "copy by value" for one array to another, you pretty much end up doing something like

$arr2 = array();
foreach($arr1 as $val) {
  $arr2[] = $val;
}

or

$arr2 = array();
for($i=count($arr1)-1; $i>-1; $i--) {
  $arr2[$i] = $arr[$i];
}

(using reverse looping mostly because not enough people remember that's a thing you can do, and is more efficient than a forward loop =)

You'd imagine there'd be an array_copy function or something to help deal with the array copy quirk, but there just doesn't seem to be one. It's odd, but one of those "the state of PHP" things. A choice was made in the past, PHP's lived with that choice for quite a few years as a result, so it's just "one of those things". Unfortunately!

Mike 'Pomax' Kamermans
  • 49,297
  • 16
  • 112
  • 153
  • thanks for this... this issue had been bugging me as well (since seeing the original posting earlier in the week). I'm a real newbie with PHP so slow on the uptake, so looked into this a bit more on my blog: http://blog.adamcameron.me/2014/09/php-looking-at-some-interesting.html. Do I get the right end of the stick in my analysis here? – Adam Cameron Sep 28 '14 at 13:23
  • that's a nice writeup and sounds about right to me. Tweet it out at Zend and php_net and see if it gets some traction through retweets =) – Mike 'Pomax' Kamermans Sep 28 '14 at 17:01
  • @AdamCameron I tweeted your post at them to see if we can get some movement on this. I think it would be an awesome change for PHP 7. – michael.schuett Sep 28 '14 at 17:25
  • 1
    @AdamCameron I did some testing and HHVM does not have this issue. yet another reason to make the switch to hhvm. – michael.schuett Sep 28 '14 at 18:05
  • Cheers fellas. TBH, once I understood what was going on, I don't see it as an issue per-se, just something that's not obvious until one takes a closer look (and for my part, I chalk that up to being a newbie). Perhaps that in itself suggests it *is* an issue, though? I'm out of my depth at this point. – Adam Cameron Sep 28 '14 at 19:49
  • @adamCameron it really is not a huge issue but it is also unexpected behavior that I don't think to many people would agree is an optimal solution in the current Zend engine. – michael.schuett Sep 28 '14 at 20:40
  • Reverse looping is more efficient than forward looping? Odd and interesting :) – Peter Sep 28 '14 at 20:54
  • @Peter... oh yeah, that was something else I was gonna apply a magnifying glass to... – Adam Cameron Sep 29 '14 at 05:04
  • 4
    fun fact: under integer conditions an `$i>-1` is a single bit check. But even without that a reverse loop only uses a single variable and a constant comparison rather than two variables and rereference on the conditional check. The main benefit is not having to invent a "start" and "end" variable though, we can just start at the "end" and we all know lists never go below index 0 so done =) – Mike 'Pomax' Kamermans Sep 29 '14 at 05:18
  • @Mike'Pomax'Kamermans readability > efficiency. Lets not forget Donald Knuth once said "We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil". It's interesting that it is more efficient, but not more practical. – Adam Sep 29 '14 at 18:46
  • sorry, when did a loop become unreadable when it runs back to front, instead of front to back? (plus they're not exactly curiosities, you use them any time you need to modify a list during looping, since you won't effect element reordering for the part you've not "seen" yet with a reverse loop =) – Mike 'Pomax' Kamermans Sep 29 '14 at 20:35
  • @Mike'Pomax'Kamermans: in that you realised you needed to point out what you were doing and why ("using reverse looping...") demonstrates Adam's (not me, the other one) point Mike. Even *you* clearly think it's anomalous coding, therefore is less "clean" (in the Bob Martin sense). Plus the gains are inconsequential, so... I'd consign this to "curiosity" rather than "recommended practice". Still: it's good to know and has given me my blog topic this evening. – Adam Cameron Oct 01 '14 at 19:53
  • it's less known to people hitting up stackoverflow. It'd hardly less known in the professional programming world ;) My comment was mostly because of that reason (people posting on SO are less likely to have ever seen it) – Mike 'Pomax' Kamermans Oct 01 '14 at 22:37
  • The only thing you are missing is that your "more efficient" for-loop reverses the order of the elements in the array. That can hardly be called a copy anymore. And both of your solutions will only work when operating on an array with continuous integer indexes. Having strings or holes in the index will break the array copy. – Sven Oct 05 '14 at 08:15
  • I think you're missing the elementary fact that `a2[i] = a1[i]` for all `i`, irrespective of the sequence you generate those `i` in, will always generate an exact copy, provided your sequence does indeed cover all `i`. Front to back, back to front, or permuted, is absolutely irrelevant. (And yes, if you have an array with holes you're obviously you're not going to use a blind iteration) – Mike 'Pomax' Kamermans Oct 05 '14 at 16:12