26

Can someone explain how PHP implements associative arrays? What underlying data structure does PHP use? Does PHP hash the key and store it in some kind of hash map? I am curious because I was wondering what the performance of associative arrays where when inserting and searching for keys.

Gordon
  • 312,688
  • 75
  • 539
  • 559
  • 1
    I'll leave this link for someone else to grind through, but you can view the actual C source for PHP at [http://svn.php.net/viewvc/php/php-src/](http://svn.php.net/viewvc/php/php-src/) – Kyle Hale Oct 29 '08 at 16:48

5 Answers5

8

The highest voted answer link is broken and doesn't give that much explanation.

PHP is written in C and the underlying structure is just a C array. C arrays are just chunks of memory. The indexes in C arrays must be continuous, you can't have an index 0 and an index 1000 that comes after it. To make associative array keys work, before they are added to the C array, they are converted to proper C indices via a hash function.

For a full explanation, I found this link to be much more informative.

http://nikic.github.io/2012/03/28/Understanding-PHPs-internal-array-implementation.html

PressingOnAlways
  • 11,948
  • 6
  • 32
  • 59
  • What is the size of the underlying C array? Is the size increased and the keys rehashed if the array grows overtime like e.g. in Java's `HashMap`? Thank you! – tonix Jul 14 '21 at 13:16
  • @tonix you can look at the sourcecode yourself - https://github.com/php/php-src/blob/master/Zend/zend_hash.c. It used to just use `HashTable` datatype in php5, but everything is zen engine now so they use zend_hash which still does use hashtables. You can read up more on it at: https://www.phpinternalsbook.com/php5/hashtables.html – PressingOnAlways Jul 17 '21 at 17:39
  • 1
    @tonix In short, yes. As with most hash tables, if an inserted element increases the container's load factor beyond the threshold defined by the implementation, the table allocates memory for a larger array and rehashes the keys. – Cy Rossignol Jul 22 '21 at 22:34
  • @CyRossignol Thank you for your reply! Rehashing all the keys sounds like an expensive `O(n)` operation. – tonix Aug 03 '21 at 12:46
  • 1
    @tonix You're right, it's a relatively expensive operation. Most generic hash tables _amortize_ this cost by allocating a bigger array than needed for one insertion so that subsequent insertions don't incur the overhead. From an algorithm analysis point-of-view, the cost of insertions approaches _O(1)_. – Cy Rossignol Aug 07 '21 at 09:38
  • @CyRossignol Thank for your replies! Using the information I read here on SO and in other places, I came up with my own implementation of a linked hash map in PHP: https://github.com/tonix-tuft/linked-hash-map. In my case, I never rehash keys, but I use a multidimensional array to identify the bucket for a given key, and each dimension of this array has a length of a different prime number, in order to minimize collisions. This implementation allows using any PHP data type as key (whereas builtin PHP arrays allow only ints and strings). Hope someone can benefit from this data structure. – tonix Aug 09 '21 at 07:58
7

It's a hash table. The type declaration and hashing function are here:
http://svn.php.net/viewvc/php/php-src/trunk/Zend/zend_hash.h?view=markup

There is a light weight array and a linked list within the spl (standard php lib)

chuck
  • 71
  • 1
  • 1
  • Source has moved to GitHub: https://github.com/php/php-src/blob/master/Zend/zend_hash.h – David Jun 16 '16 at 00:32
6

Well, for what it is worth, all PHP arrays are Associative arrays.

EBGreen
  • 36,735
  • 12
  • 65
  • 85
3

@EBGreen is correct.

Which gives you some interesting performance problems, especially when treating an array as a list and using the [] (array add) operator. PHP doesn't seem to cache the largest numeric key and add one to it, instead it seems to traverse all of the keys to find what the next numeric key should be. I've rewritten scripts in python because of PHP's dismal array-as-a-list performance.

Associative arrays have the standard dict/hash performance overhead.

jcoby
  • 4,210
  • 2
  • 29
  • 25
  • 3
    Are you sure about this? I've just ran benchmarks on a test array of 1000 entries (copying to a new array, one by one), and if you don't specify the key for the new array, it's consistently 7% faster (on PHP 5.2.6) – JamShady Oct 29 '08 at 21:08
  • It's possible they've changed it recently. I was using 5.1 when I was doing the work. PHP's array was AWFUL when you're talking about 10k entries or more. – jcoby Oct 29 '08 at 21:46
  • 2
    AFAIK this is not the case, please compare: [A zend hash table has an element `nNextFreeElement` ...](http://stackoverflow.com/questions/3698743/how-to-find-the-next-numeric-index-of-an-existing-array/3698786#3698786) – hakre Oct 03 '11 at 09:22
  • @RickyMason. You probably wouldn't normally, but for thorough testing, calculating per item times for 10, 100, 1k and 10k would really highlight scalability performance issues, especially if there is a chance that 10k may have to be handled. – Patanjali Aug 27 '16 at 11:13
2

It's all hash tables, according to sources in various web forums: http://www.usenet-forums.com/php-language/15348-zend-engine-array-implementation.html

If you want to be sure, read the source, then compile it, but make sure you can trust your compiler (Warning: PDF, and unrelated, but very cool).

jakber
  • 3,549
  • 20
  • 20