24

I have found that array_key_exists is over 1000x slower than isset at check if a key is set in an array reference. Does anyone that has an understanding of how PHP is implemented explain why this is true?

EDIT: I've added another case that seems to point to it being overhead required in calling functions with a reference.

Benchmark Example

function isset_( $key, array $array )
{
    return isset( $array[$key] );
}

$my_array = array();
$start = microtime( TRUE );
for( $i = 1; $i < 10000; $i++ ) {
    array_key_exists( $i, $my_array );
    $my_array[$i] = 0;
}
$stop = microtime( TRUE );
print "array_key_exists( \$my_array ) ".($stop-$start).PHP_EOL;
unset( $my_array, $my_array_ref, $start, $stop, $i );

$my_array = array();
$start = microtime( TRUE );
for( $i = 1; $i < 10000; $i++ ) {
    isset( $my_array[$i] );
    $my_array[$i] = 0;
}
$stop = microtime( TRUE );
print "isset( \$my_array ) ".($stop-$start).PHP_EOL;
unset( $my_array, $my_array_ref, $start, $stop, $i );

$my_array = array();
$start = microtime( TRUE );
for( $i = 1; $i < 10000; $i++ ) {
    isset_( $i, $my_array );
    $my_array[$i] = 0;
}
$stop = microtime( TRUE );
print "isset_( \$my_array ) ".($stop-$start).PHP_EOL;
unset( $my_array, $my_array_ref, $start, $stop, $i );

$my_array = array();
$my_array_ref = &$my_array;
$start = microtime( TRUE );
for( $i = 1; $i < 10000; $i++ ) {
    array_key_exists( $i, $my_array_ref );
    $my_array_ref[$i] = 0;
}
$stop = microtime( TRUE );
print "array_key_exists( \$my_array_ref ) ".($stop-$start).PHP_EOL;
unset( $my_array, $my_array_ref, $start, $stop, $i );

$my_array = array();
$my_array_ref = &$my_array;
$start = microtime( TRUE );
for( $i = 1; $i < 10000; $i++ ) {
    isset( $my_array_ref[$i] );
    $my_array_ref[$i] = 0;
}
$stop = microtime( TRUE );
print "isset( \$my_array_ref ) ".($stop-$start).PHP_EOL;
unset( $my_array, $my_array_ref, $start, $stop, $i );

$my_array = array();
$my_array_ref = &$my_array;
$start = microtime( TRUE );
for( $i = 1; $i < 10000; $i++ ) {
    isset_( $i, $my_array_ref );
    $my_array_ref[$i] = 0;
}
$stop = microtime( TRUE );
print "isset_( \$my_array_ref ) ".($stop-$start).PHP_EOL;
unset( $my_array, $my_array_ref, $start, $stop, $i );

Output

array_key_exists( $my_array ) 0.0056459903717
isset( $my_array ) 0.00234198570251
isset_( $my_array ) 0.00539588928223
array_key_exists( $my_array_ref ) 3.64232587814 // <~ what on earth?
isset( $my_array_ref ) 0.00222992897034
isset_( $my_array_ref ) 4.12856411934 // <~ what on earth?

I'm on PHP 5.3.6.

Codepad example.

Kendall Hopkins
  • 43,213
  • 17
  • 66
  • 89
  • 1
    Search the internet for [`isset vs. array_key_exists`](http://www.google.com/search?q=isset+vs.+array_key_exists) and you will find more resources than you're probably able to read. – hakre Jun 14 '11 at 00:40
  • 2
    @hakre I understand the differences between `isset` and `array_key_exists` (how they work), my question is why this performance hit is happening when they are used on references. – Kendall Hopkins Jun 14 '11 at 00:42
  • array_key_exists must check for a concrete key value, isset must not. If you must look for the concrete value of a reference, the reference must be resolved additionally. So there is more work to do which takes more time. Why it's so much more time I can not tell you, but you're as well unsetting references, and this will influence the amount of work. – hakre Jun 14 '11 at 00:47
  • 1
    @hakre I've updated my example with testing `function( $key, $array ) { return isset( $array[$key] ); }`. It is also slow, this seems to point to overhead required in calling a function w/ a reference. – Kendall Hopkins Jun 14 '11 at 00:51
  • 1
    I don't think your test is really showing you what you think. Userspace functions (like the one you created) is always going to be significantly slower than in-built functions. – Shane H Jun 18 '11 at 12:58

4 Answers4

8

At work I've got a VM instance of PHP that includes a PECL extension called VLD. This lets you execute PHP code from the commandline and rather than execute it, it returns the generated opcode instead.

It's brilliant at answering questions like this.

http://pecl.php.net/package/vld

Just in case you go this route (and if you're generally curious about how PHP works internally, i think you should) you should definitely install it on a virtual machine (that is, i wouldn't install it on a machine i'm trying to develop on or deploy to). And this is the command you'll use to make it sing:

php -d vld.execute=0 -d vld.active=1 -f foo.php

Looking at the opcodes will tell you a more complete story, however, I have a guess.... Most of PHP's built-ins make a copy of an Array/Object and act upon that copy (and not a copy-on-write either, an immediate copy). The most widely known example of this is foreach(). When you pass an array into foreach(), PHP is actually making a copy of that array and iterating on the copy. Whis is why you'll see a significant performance benefit by passing an array as a reference into foreach like this:

foreach($someReallyBigArray as $k => &$v)

But this behavior -- that passing in an explicit reference like that -- is unique to foreach(). So I would be very surprised if it made an array_key_exists() check any faster.

Ok, back to what I was getting at..

Most the built-ins take a copy of an array and act upon that copy. I am going to venture a completely unqualified guess that isset() is highly optimized and that one of those optimizations is perhaps to not do an immediate copy of an Array when its passed-in.

I'll try to answer any other questions you may have but you could probably read a lot of you google for "zval_struct" (which is the data structure in the PHP internals which stores each variable. It's a C struct (think.. an associative array) that has keys like "value", "type", "refcount".

Shane H
  • 3,263
  • 5
  • 26
  • 30
  • 1
    I don't think this really answers the question. While that tool looks very interesting and I'll have to give it a try, you answer is a guess at best. BTW `isset` isn't a function, it's a language construct (why it probably doesn't have the overhead showing up here, until you wrap it in a function). – Kendall Hopkins Jun 20 '11 at 06:11
  • 2
    Kendall, this is only a "guess" because I didn't grep the php-src tree to answer your question. It's a very informed guess. I've lately worked on more PHP Extenstion development (in C) than PHP code itself. And your comment about "language construct"... You just got that from the PHP manual? But you seem to not understand what that really even means. All it means is that, in the C code, it's not implemented as a global ZEND_FUNCTION. The "overhead" for calling isset() is no different than the overhead of calling, say, "is_a" which IS a ZEND_FUNCTION. – Shane H Jun 20 '11 at 12:34
  • Anyhow, believe what you'd like. You seem to have a hypothesis that you're attached to regardless of facts. And more than once you seem to think that a userspace function (that is, a function you create in PHP code) is somehow comparable to a language built-in. The only thing they have in common is that they're both named "function." – Shane H Jun 20 '11 at 12:36
3

Here is the source of the array_key_exists function for 5.2.17. You can see that even if the key is null, PHP attempts to compute a hash. Although it's interesting that if you remove

// $my_array_ref[$i] = NULL;

then it performs better. There must be multiple hash lookups occuring.

/* {{{ proto bool array_key_exists(mixed key, array search)
   Checks if the given key or index exists in the array */
PHP_FUNCTION(array_key_exists)
{
    zval **key,                 /* key to check for */
         **array;               /* array to check in */

    if (ZEND_NUM_ARGS() != 2 ||
        zend_get_parameters_ex(ZEND_NUM_ARGS(), &key, &array) == FAILURE) {
        WRONG_PARAM_COUNT;
    }

    if (Z_TYPE_PP(array) != IS_ARRAY && Z_TYPE_PP(array) != IS_OBJECT) {
        php_error_docref(NULL TSRMLS_CC, E_WARNING, "The second argument should be either an array or an object");
        RETURN_FALSE;
    }

    switch (Z_TYPE_PP(key)) {
        case IS_STRING:
            if (zend_symtable_exists(HASH_OF(*array), Z_STRVAL_PP(key), Z_STRLEN_PP(key)+1)) {
                RETURN_TRUE;
            }
            RETURN_FALSE;
        case IS_LONG:
            if (zend_hash_index_exists(HASH_OF(*array), Z_LVAL_PP(key))) {
                RETURN_TRUE;
            }
            RETURN_FALSE;
        case IS_NULL:
            if (zend_hash_exists(HASH_OF(*array), "", 1)) {
                RETURN_TRUE;
            }
            RETURN_FALSE;

        default:
            php_error_docref(NULL TSRMLS_CC, E_WARNING, "The first argument should be either a string or an integer");
            RETURN_FALSE;
    }

}
Dan Spiteri
  • 1,809
  • 1
  • 13
  • 14
  • I don't think a few more lookups could cause a 1000x slow down. I'm pretty sure the problem is a bit more than just the array_key_exists function anyway. As you can see in the example, it happens for any function (see isset_). – Kendall Hopkins Jul 01 '11 at 05:44
  • I think it's internal to zend_hash_index_exists() et al; accessing an unset element in general is slower than a set element. I've optimized production code in the past by testing first for truthy (!!$hash->$value) and if still not sure only then with array_key_exists(), it ran much faster. – Andras Dec 07 '14 at 22:31
  • the issue is the separation (aka duplication of the array…) – bwoebi Jul 25 '16 at 08:03
3

Since PHP >= 7.4 array_key_exists is now as fast as isset.

See https://www.php.net/manual/en/function.array-key-exists.php#125313

And some perfomance tests:

<?php declare(strict_types = 1);

function testPerformance($name, Closure $closure, $runs = 1000000)
{
    $start = microtime(true);
    for (; $runs > 0; $runs--)
    {
        $closure();
    }
    $end = microtime(true);

    printf("Function call %s took %.5f seconds\n", $name, $end - $start);
}

$items = [1111111];
for ($i = 0; $i < 100000; $i++) {
    $items[] = rand(0, 1000000);
}
$items = array_unique($items);
shuffle($items);

$assocItems = array_combine($items, array_fill(0, count($items), true));

$isset = function () use ($assocItems) {
    isset($items[1111111]);
};

$array_key_exists = function () use ($assocItems) {
    array_key_exists(1111111, $assocItems);
};

testPerformance('isset', $isset, 100000);
testPerformance('array_key_exists', $array_key_exists, 100000);

Output:

Function call isset took 0.00561 seconds
Function call array_key_exists took 0.00547 seconds
Arsen
  • 166
  • 1
  • 7
1

Not array_key_exists, but the removal of the reference (= NULL) causes this. I commented it out from your script and this is the result:

array_key_exists( $my_array ) 0.0059430599212646
isset( $my_array ) 0.0027170181274414
array_key_exists( $my_array_ref ) 0.0038740634918213
isset( $my_array_ref ) 0.0025200843811035

Only removed the unsetting from the array_key_exists( $my_array_ref ) part, this is the modified part for reference:

$my_array = array();
$my_array_ref = &$my_array;
$start = microtime( TRUE );
for( $i = 1; $i < 10000; $i++ ) {
    array_key_exists( $i, $my_array_ref );
    // $my_array_ref[$i] = NULL;
}
$stop = microtime( TRUE );
print "array_key_exists( \$my_array_ref ) ".($stop-$start).PHP_EOL;
unset( $my_array, $my_array_ref, $start, $stop, $i );
hakre
  • 193,403
  • 52
  • 435
  • 836
  • Then it's useless. The `= NULL` is just in place a larger value. I'll replace it w/ 0 in my example to make that more clear. – Kendall Hopkins Jun 14 '11 at 00:52
  • By removing that line, that function can probably return early (since there is no keys in the array). Besides it doesn't reflect the real-life problem I ran into (an ref array with keys in it). – Kendall Hopkins Jun 14 '11 at 00:53
  • 2
    Hmm, probably each time a reference changes, the hash is triggered to re-index the keys. – hakre Jun 14 '11 at 00:56
  • Hmm, why do you need a reference at all? – hakre Jun 14 '11 at 00:58
  • I think your right about the `= ` is doing something to kill the performance of the reference. I don't understand how it can require 1000x the work to do it though. – Kendall Hopkins Jun 14 '11 at 01:00
  • I found this bug while passing an array into a closure by reference because I wanted to change it in the closure. I simplified the case down so that closures wouldn't confuse people. – Kendall Hopkins Jun 14 '11 at 01:01
  • Look inside the PHP source code what is triggered by this. Then you probably find out. I do not know that, but I think this is related to memory management and copy on write plus the hash array structure. – hakre Jun 14 '11 at 01:02
  • I know there is ways to get around this, but I'm more interested in the WHY that this is happening, not how to fix it (though that would be nice also). – Kendall Hopkins Jun 14 '11 at 01:02
  • Hence why I was hoping with someone w/ knowledge of PHP internals would shed some light on this problem. – Kendall Hopkins Jun 14 '11 at 01:03
  • I think you can fix this by not using a reference. There is not much you can do else in an assignment on the PHP level I think, so no clue next to not using a reference for the how to fix. The why lies within the source, probably konforce knows that. – hakre Jun 14 '11 at 01:04
  • PHP references have been the source of much pain. I've had problems with segfaults, memory corruption and now this... – Kendall Hopkins Jun 14 '11 at 01:09
  • Why are you making use of those? You have not written much to that, normally those are not needed at all. – hakre Jun 14 '11 at 01:18
  • Sometimes they are nice to use when you need to temporally treat an array like an object or if you need to make an array reference itself. But I'll agree, most of the time there is a better way. – Kendall Hopkins Jun 14 '11 at 01:39
  • The part about temporarily treat an array like an object sounds interesting. How do you do that with references? – hakre Jun 14 '11 at 09:47
  • I just meant that arrays default to pass by value and objects pass by references and sometimes it's preferable to switch their behavior. – Kendall Hopkins Jun 14 '11 at 13:38
  • You should checkout [ArrayObject](http://php.net/manual/en/class.arrayobject.php) and probably make use of the [ArrayAccess](http://php.net/manual/en/class.arrayaccess.php) interface. I think that's worth to try out. An object that behaves similar to an array. – hakre Jun 14 '11 at 13:42