0

PHP:

$a = array("key" => 23);
var_dump($a);

$c = &$a["key"];
var_dump($a);

unset($c);
var_dump($a);

Output:

array(1) {
  ["key"]=>
  int(23)
}
array(1) {
  ["key"]=>
  &int(23)
}
array(1) {
  ["key"]=>
  int(23)
}

In the second dump the value of "key" is shown as a reference. Why is that? If I do the same with a normal variable instead of an array key this does not happen.

My only explanation would be that array keys are usually stored as references and as long as there is only one entry in the symbol table it is shown as a scalar in the dump.

Megatron
  • 123
  • 1
  • 11
  • Yes it seems as if PHP recognizes that you use a reference and so replace the value in the array with the same reference too. So that if you update one of both values the other gets updated too. – TiMESPLiNTER Sep 29 '14 at 11:38
  • Though this does not happen if i use a normal variable, not an array key. – Megatron Sep 29 '14 at 11:42
  • Duplicate of http://stackoverflow.com/q/17528280/476; however, the below answer is so good that I don't want to close it. – deceze Sep 29 '14 at 12:33

1 Answers1

2

Internally, PHP arrays are hashmaps (or dictionaries, or HashTables or whatever you want to call it). Even a numerically indexed array is implemented as a hash table, which is a zval, just like any other.
However, what you're seeing is expected behaviour, which is explained both here and here.

Basically, what your array looks like internally is this:

typedef struct _zval_struct {
    zvalue_value value;
    zend_uint refcount__gc;
    zend_uchar type;
    zend_uchar is_ref__gc;
} zval;
//zval_value:
typedef union _zvalue_value {
    long lval;
    double dval;
    struct {
        char *val;
        int len;
    } str;
    HashTable *ht;
    zend_object_value obj;
} zvalue_value;

In case of an array, the zval.type will be set to indicate that the zval value is an array, and so the zval_value.ht member will be used.
What happens when you write $c = &$a['key'] is that the zval that is assigned to $a['key'] will be updated: zval.refcount__gc will be incremented, and is_ref__gc will be set to 1. Simply because the value is not copied, but the value is used by more than 1 variable: meaning this value is a reference. Once you unset($c);, the refcount is decremented, and the reference is lost, and so is_ref is set to 0.

Now for the big one: Why don't you see the same thing when you use regular, scalar variables? Well, that's because an array is a HashTable, complete with its own, internal, ref-counting (zval_ptr_dtor). Once an array itself is empty, it too should be destroyed. By creating a reference to an array value, and you unset the array, the zval should be GC'ed. But that would mean you have a reference to a destroyed zval floating around.
Therefore, the zval in the array is changed to a reference, too: a reference can be deleted safely. So that if you were to do this:

$foo = array(123);
$bar = &$foo[0];
unset($foo[0]);
echo $bar, PHP_EOL;

Your code will still work as expected: $foo[0] no longer exists, but $bar is now the only existing reference to 123.

This is just a really, really, short and incomplete explanation, but google the PHP internals, and how the memory management works, how references are dealt with internally, and how the garbage collector uses the is_ref and refcount members to manage the memory.
Pay special attention to the internal mechanisms like copy-on-write, and (when looking through the first link I provided here), look for the snippet that looks like this:

$ref = &$array;
foreach ($ref as $val) {}

Because it deals with some oddities in terms of references and arrays.

Community
  • 1
  • 1
Elias Van Ootegem
  • 74,482
  • 9
  • 111
  • 149
  • That was quite elaborate, thank you. After reading through the internals of PHP I am still confused why this does not happen with normal variables but only keys in arrays. According to the zval structure every variable which contents are referenced more than once should be shown as a reference on var_dump. Still it does only happen when I reference a key in an array. – Megatron Sep 29 '14 at 12:34
  • @Megatron: It's to do with the refcounting that is being done for the HashTable (the array). Because the value you reference should persist, even after the array is GC'ed, the array will hold a reference to the `zval`, not the `zval` itself. the actual zval is set asside, and will only be GC'ed if no variable references it (grossly oversimplified, but you get the idea) – Elias Van Ootegem Sep 29 '14 at 12:52
  • Ah clever, I didn't think of that. Now everything seems clear to me. Thank you :) – Megatron Sep 29 '14 at 14:53
  • @Megatron: You're welcome. [pedantic]Please do note that, in the help section, you're asked not to post _"thank you"_ comments, but rather vote for answers you found helpful (accept and/or up-votes) – Elias Van Ootegem Sep 29 '14 at 14:55