108

This question is merely for me as I always like to write optimized code that can run also on cheap slow servers (or servers with A LOT of traffic)

I looked around and I was not able to find an answer. I was wondering what is faster between those two examples keeping in mind that the array's keys in my case are not important (pseudo-code naturally):

<?php
$a = array();
while($new_val = 'get over 100k email addresses already lowercased'){
    if(!in_array($new_val, $a){
        $a[] = $new_val;
        //do other stuff
    }
}
?>

<?php
$a = array();
while($new_val = 'get over 100k email addresses already lowercased'){
    if(!isset($a[$new_val]){
        $a[$new_val] = true;
        //do other stuff
    }
}
?>

As the point of the question is not the array collision, I would like to add that if you are afraid of colliding inserts for $a[$new_value], you can use $a[md5($new_value)]. it can still cause collisions, but would take away from a possible DoS attack when reading from a user provided file (http://nikic.github.com/2011/12/28/Supercolliding-a-PHP-array.html)

starball
  • 20,030
  • 7
  • 43
  • 238
Fabrizio
  • 3,734
  • 2
  • 29
  • 32
  • 3
    If you are always striving to write optimized code, you're surely using a profiler then once in a while? – mario Nov 20 '12 at 22:21
  • Ummm what is the `while` for? – Naftali Nov 20 '12 at 22:22
  • I do use profilers and the while is not the core of the question, is merely on the isset Vs in_array. The profiler will give me an answer based on the instance/server/memory avail/cpu avail/ etc... I want to write code that yes keep those things in mind but that is also more portable, mainly, I do want to know what happen in the background – Fabrizio Nov 20 '12 at 22:23
  • It should be noted that unless the array keys and values are the same, `isset($a[$new_val])` is not the same as `in_array($new_val, $a)`. – Jason McCreary Nov 20 '12 at 22:43
  • @JasonMcCreary, you are right, but `in_array($new_val, $a)` where `$a[] = $new_val;` is the same as `isset($a[$new_val])` where `$a[$new_val] = true` – Fabrizio Nov 20 '12 at 22:45
  • 66
    I vote to reopen. The question is well formed and answers are supported with facts and references. While a *micro*-optimization, these types of questions are *constructive*. – Jason McCreary Nov 21 '12 at 13:41
  • 5
    @JasonMcCreary second; just one more. – Ja͢ck Nov 22 '12 at 02:15
  • 7
    This is many years later, but I wouldn't even consider this a micro optimization. For large data sets it can make a ton of difference!! – Robert Oct 11 '16 at 22:12
  • 2
    Agreed with the fact it is not a micro optimization: using isset() instead of in_array() made me save minutes of execution time on datasets containing more than 20.000 entries – Vincent Nov 22 '17 at 14:49
  • 4
    ...this question looks "constructive" to me. I'll start another re-opening campaign. – mickmackusa May 24 '18 at 23:41
  • in_array() — Checks if a value exists in an array. isset() - Checks if an array has key. in_array('key', $array) != isset($array['key']), because keys and values can be different – ustmaestro Jun 11 '18 at 23:51
  • $ar = ['a','b','c']; isset($ar['a']) === false and in_array('a', $ar) === true – ustmaestro Jun 12 '18 at 00:04
  • So what happens when you rise in the ranks on stackoverflow - don't you just get a convinent button with "open" on it? Perhaps certain things remain employees only, or else we are in great need of _active_ moderators that notice which way the thumbs point. ;) – Christoffer Bubach Aug 16 '20 at 05:04
  • You don't have the possibility to do a strict comparison with isset. With in_array its possible with the third parameter 'strict' – Skip Nov 23 '22 at 10:30

4 Answers4

133

The answers so far are spot-on. Using isset in this case is faster because

  • It uses an O(1) hash search on the key whereas in_array must check every value until it finds a match.
  • Being an opcode, it has less overhead than calling the in_array built-in function.

These can be demonstrated by using an array with values (10,000 in the test below), forcing in_array to do more searching.

isset:    0.009623
in_array: 1.738441

This builds on Jason's benchmark by filling in some random values and occasionally finding a value that exists in the array. All random, so beware that times will fluctuate.

$a = array();
for ($i = 0; $i < 10000; ++$i) {
    $v = rand(1, 1000000);
    $a[$v] = $v;
}
echo "Size: ", count($a), PHP_EOL;

$start = microtime( true );

for ($i = 0; $i < 10000; ++$i) {
    isset($a[rand(1, 1000000)]);
}

$total_time = microtime( true ) - $start;
echo "Total time: ", number_format($total_time, 6), PHP_EOL;

$start = microtime( true );

for ($i = 0; $i < 10000; ++$i) {
    in_array(rand(1, 1000000), $a);
}

$total_time = microtime( true ) - $start;
echo "Total time: ", number_format($total_time, 6), PHP_EOL;
David Harkness
  • 35,992
  • 10
  • 112
  • 134
  • I know about hashes, but wondering why something similar is not done on arrays values when possible to speed up functions, it will also reduce memory consuption if similar values are used by simply adding an extra hashing on the value.. correct ? – Fabrizio Nov 20 '12 at 23:00
  • 3
    @Fabrizio - Array values can be duplicated and contain non-hashable objects. Keys must be unique and can only be strings and integers which makes them easily hashable. While you could create a one-to-one map that hashes both keys and values, this isn't how PHP's array works. – David Harkness Nov 20 '12 at 23:29
  • 3
    In case you are sure that you array contains unique values then there is another option - *flip + isset*. – Arkadij Kuzhel Aug 20 '15 at 12:47
  • worth noting a flipped isset is still faster in this example than in_array: ``` $start = microtime( true); $foo = array_flip($a); for ($i = 0; $i < 10000; ++$i) { isset($foo[rand(1, 1000000)]); } $total_time = microtime( true ) - $start; echo "Total time (flipped isset): ", number_format($total_time, 6), PHP_EOL; – Andre Baumeier Jan 17 '17 at 08:41
  • @AndreBaumeier Which is faster will depend on the size of the array and how many tests you'll make. Flipping a ten thousand element array to perform three tests is probably not efficient. – David Harkness Jan 17 '17 at 09:23
  • I know this is all a million years old BUT the size of the array you're flipping doesn't make a difference in his Andre's code. However, a better test would be to flip the array inside loop. In that case array flip performs poorly. – jonlink Sep 11 '21 at 00:49
48

Which is faster: isset() vs in_array()

isset() is faster.

While it should be obvious, isset() only tests a single value. Whereas in_array() will iterate over the entire array, testing the value of each element.

Rough benchmarking is quite easy using microtime().

Results:

Total time isset():    0.002857
Total time in_array(): 0.017103

Note: Results were similar regardless if existed or not.

Code:

<?php
$a = array();
$start = microtime( true );

for ($i = 0; $i < 10000; ++$i) {
    isset($a['key']);
}

$total_time = microtime( true ) - $start;
echo "Total time: ", number_format($total_time, 6), PHP_EOL;

$start = microtime( true );

for ($i = 0; $i < 10000; ++$i) {
    in_array('key', $a);
}

$total_time = microtime( true ) - $start;
echo "Total time: ", number_format($total_time, 6), PHP_EOL;

exit;

Additional Resources

I'd encourage you to also look at:

Jason McCreary
  • 71,546
  • 23
  • 135
  • 174
  • Nice solution. I'm surprised more people don't split-time their functions/code more using `microtime()` or other tools. Incredibly valuable. – nickhar Nov 20 '12 at 22:32
  • 1
    Searching an empty array for the same key only highlights the overhead of calling the `in_array` function versus using the `isset` built-in. This would be better with an array containing a bunch of random keys and occasionally searching for an existing key/value. – David Harkness Nov 20 '12 at 22:35
  • I do use benchmarks and microtime quite a bit, but I also realized, while I was testing `while` and `foreach` that at each refresh I was getting different "winners". it always depend on too many server variables, and the best is to iterate a very large number of times on different times and get the one that win more often, or just know what happening in the background and know that it will be the final winner no matter what – Fabrizio Nov 20 '12 at 22:37
  • @David Harkness, you've already nit-picked my answer. If you want more, stand on my shoulders and post your own answer. :) Nonetheless, if the function overhead is already significantly more expensive relative to `isset()`, what makes you think passing it a *larger* array would make it *faster*? – Jason McCreary Nov 20 '12 at 22:38
  • @JasonMcCreary - Your answer is already quite good! Okay, I'll add a version with random elements in the array. – David Harkness Nov 20 '12 at 22:42
  • @JasonMcCreary, actually I do see what is trying to say. Your example shows even more that in_array is slower, but again what does it happen "internally" ? does one uses the trai and the other scan the array or... ? – Fabrizio Nov 20 '12 at 22:43
  • @JasonMcCreary - Using a filled array would make `in_array` much *slower.* My point was only that the overhead is more significant with an empty array. – David Harkness Nov 20 '12 at 22:49
  • Doesn't `isset()` check all the keys anyway? not sure what you mean with checking only one variable. when you do `$a[$k]` PHP recognize $a as an array, then has to find the value associate with the key $k contained in that array, meaning it has to "scan" the array keys.. correct ? – Fabrizio Nov 20 '12 at 22:51
  • 1
    @Fabrizio - Read up on [hashing functions](http://en.wikipedia.org/wiki/Hash_function) and [hash tables](http://en.wikipedia.org/wiki/Hash_table). – David Harkness Nov 20 '12 at 22:54
  • @DavidHarkness I know about hashes, but wondering why something similar is not done on arrays values when possible to speed up functions, it will also reduce memory consuption if similar values are used by simply adding an extra hashing on the value.. correct ? – Fabrizio Nov 20 '12 at 22:57
19

Using isset() takes advantage of speedier lookup because it uses a hash table, avoiding the need for O(n) searches.

The key is hashed first using the djb hash function to determine the bucket of similarly hashed keys in O(1). The bucket is then searched iteratively until the exact key is found in O(n).

Barring any intentional hash collisions, this approach yields much better performance than in_array().

Note that when using isset() in the way that you've shown, passing the final values to another function requires using array_keys() to create a new array. A memory compromise can be made by storing the data in both the keys and values.

Update

A good way to see how your code design decisions affect runtime performance, you can check out the compiled version of your script:

echo isset($arr[123])

compiled vars:  !0 = $arr
line     # *  op                           fetch      ext  return  operands
-----------------------------------------------------------------------------
   1     0  >   ZEND_ISSET_ISEMPTY_DIM_OBJ              2000000  ~0      !0, 123
         1      ECHO                                                 ~0
         2    > RETURN                                               null

echo in_array(123, $arr)

compiled vars:  !0 = $arr
line     # *  op                           fetch      ext  return  operands
-----------------------------------------------------------------------------
   1     0  >   SEND_VAL                                             123
         1      SEND_VAR                                             !0
         2      DO_FCALL                                 2  $0      'in_array'
         3      ECHO                                                 $0
         4    > RETURN                                               null

Not only does in_array() use a relatively inefficient O(n) search, it also needs to be called as a function (DO_FCALL) whereas isset() uses a single opcode (ZEND_ISSET_ISEMPTY_DIM_OBJ) for this.

Community
  • 1
  • 1
Ja͢ck
  • 170,779
  • 38
  • 263
  • 309
8

The second would be faster, as it is looking only for that specific array key and does not need to iterate over the entire array until it is found (will look at every array element if it is not found)

Mike Brant
  • 70,514
  • 10
  • 99
  • 103
  • but is also depends of whereabouts of a searched var in global scope – el Dude Nov 20 '12 at 22:28
  • @EL2002, can you please elaborate on that statement? – Fabrizio Nov 20 '12 at 22:34
  • 1
    Mike, wouldn't be looking at the whole array even with the `isset()` if it is not found ? – Fabrizio Nov 20 '12 at 22:46
  • @Fabrizio No you wouldn't as the lookup would be done directly on the array key specified. You don't need to iterate the array to evaluate the keys. – Mike Brant Nov 20 '12 at 22:48
  • 1
    @Fabrizio No, it doesn't need to iterate. Internally (in C) the PHP array is just a hash table. In order to lookup up a single index value, C just makes a hash of that value and looks up its assigned location in memory. There is either a value there or there isn't. – Mike Brant Nov 20 '12 at 23:02
  • 1
    @Fabrizio This article provides a good overview of how arrays are internally represented in C by PHP. http://nikic.github.com/2012/03/28/Understanding-PHPs-internal-array-implementation.html – Mike Brant Nov 20 '12 at 23:04