1

While I find most of PHP criticism pedantic at best, lack of a clear datatructures line up is becoming a practical limitation on my everyday job. The array() constructor creates data-structures that claim to do everything, but in reality I end up lacking relevant information that I would need in order to use it efficiently. Specifically, I have no idea of what data-structure it really is. Is it a list? What kind of list? An array of pointers? A btree? An hashmap?

How is lookup performed? Since the same datastructure has numerical and 'associative' lookup, I assume cannot perform lookups based on offset like one can do, say, in a C array.

For small amounts of data, I naturally do not care much about such performance optimizations. However, the software I work on it is started to be used in scenarios where moderately big data structures slow things down a bit.

Moreover, is it possible to explicitly create each mentioned datastructure? How?

Pico
  • 593
  • 1
  • 5
  • 15
  • 1
    You might find this question useful: http://stackoverflow.com/questions/2350361/how-is-the-php-array-implemented-on-the-c-level – templatetypedef Feb 08 '13 at 00:30
  • An alternative would be to offshore any difficult work to a language where you know how to write efficiently – Patashu Feb 08 '13 at 00:40

2 Answers2

2

In PHP, arrays are internally represented as doubly-linked lists. There's some SPL classes that let you create other data structures.

Part of the magic of PHP is not having to select different data structures, but as you've observed it also comes with performance limitations.

quickshiftin
  • 66,362
  • 10
  • 68
  • 89
  • 2
    The link you've provided explicitly contradicts your answer. It says that PHP arrays are implemented as ordered hash tables. – templatetypedef Feb 08 '13 at 01:02
  • Pickup a copy of [Sara Goldman's Extending & Embedding PHP](http://www.amazon.com/Extending-Embedding-PHP-Sara-Golemon/dp/067232704X). Page 93 "A HashTable is a specialized form a doubly linked list that adds the speed and efficiency of vectors in the form of lookup indicies." Trust me, they're doubly linked lists under the hood; even if they appear as hash tables in userspace. – quickshiftin Feb 08 '13 at 01:08
  • 1
    While I see what you are saying, hash tables are fundamentally different data structures than doubly-linked lists and have totally different performance characteristics. It looks like PHP uses hash tables with a doubly-linked list threaded through them in order to get the performance benefits of both structures. It's a clever idea! – templatetypedef Feb 08 '13 at 01:18
  • Yes, I understand they are very different; I have a computer science degree. AFAIR 'everything' in PHP is a hash table (per Sara "all userspace variables are stored in `HashTables` as `zval*` pointers. In later chapters you'll see how the Zend Engine uses `HashTables` to store userspace functions, classes, resources, autoglobal labels and other structures as well"); but beneath that even, the underpinnings are doubly linked lists, per the earlier comment. I'm not sure how the algorithmic complexity shakes out in the real world with this hybrid setup. Would be a great exercise to find out. – quickshiftin Feb 08 '13 at 01:36
1

I'd suggest that it's best to view arrays as a hash table implementation primarily, as if you spend a little time with the source you'll notice that whilst Zend HTs can be traversed as DLLs most of the operations you might be interested to compare (add, random access) are typical HT implementations.

Where we're dealing purely in numerical keys then it's essentially a pass-through to a C array without involving the hash function, and in this case there will be precisely one HT bucket per key. Of course the ability to dynamically extend the array is facilitated by a periodic memory reallocation, so you'll suffer a penalty there (as you might if you didn't pre-allocate a large std::vector in C++).

HTH.

jstephenson
  • 2,170
  • 14
  • 14