2

I'm creating application that will create a very very large array, and will search them. I just want to know if there is a good PHP array search algorithm to do that task?

Example: I have an array that contains over 2M keys and values, what is the best way to search?

EDIT I've Created a flatfile dbms that based on arrays so i want to find the best way to search it

  • 1
    Is your array sorted? If it isn't I don't think you have any choice but loop through your array. But anyway, 2 million keys isn't so much for a computer. – kren470 Oct 27 '13 at 21:32
  • 4
    Why won't you use a database? Can we see an example of your keys and values? As @kren470 implies, if you are searching by keys, then their being sorted may help. Also, have you tried an ordinary brute-force search? It may be so quick you don't need to optimise it. – halfer Oct 27 '13 at 21:36
  • Alternative datastructures such as SPLHeap might be more appropriate than an array, depending on your data – Mark Baker Oct 27 '13 at 21:47
  • These might be useful to go through : [List of Big-O for PHP functions](http://stackoverflow.com/questions/2473989/list-of-big-o-for-php-functions) , [Memory optimization in PHP array](http://stackoverflow.com/questions/6336528/memory-optimization-in-php-array) – Uours Oct 27 '13 at 21:53

2 Answers2

5

A couple of things:

  1. Try it, benchmark several approaches, and see which one is the faster
  2. Consider using objects
  3. Do think about DB's at least... it could be a NoSQL key->value storage thing like Redis.io (which is dead-fast)
  4. Search algorithms, sure there are plenty of them around

But storing an assoc array of 2M keys in memory will mean you'll have tons of hash collisions, which will slow you down anyway. Sort the array, chunk it, and apply a decent search algorithm and you might get it to work reasonably fast, but to be brutally honest, I would say you're about to make a bad decision.

Also consider this: PHP is stateless by design, each time your script runs, the data has to be loaded into memory again (for each request if it's a web application you're writing). It's not unlikely that that will be a bigger bottleneck than a brute-force search on a HashTable will ever be.
The quickest way to find this out is to run a test, once with APC (or alternatives) turned off, and then again, but cache the array you want to search first. Measure the difference between the two, and you'll get an idea of how much the actual construction of the array is costing you

Elias Van Ootegem
  • 74,482
  • 9
  • 111
  • 149
2

The best way to go would be to use array_search(). PHP has heavily optimized their in C written functions.

If this is still too slow, you should switch to an other 'programming' language (PHP isn't popular for its speed).

There are algorithms available that use your graphics card to search specific values in parallel.

Joren
  • 3,068
  • 25
  • 44
  • I've found a faster way: using `array_flip()` and then check with `isset()`. Of course, it's not "safe" when there is duplicate values (which becomes keys when flipped) – HamZa Oct 27 '13 at 22:40
  • 1
    @HamZa: `array_flip` is _O(n)_ + `isset` is an _O(n)_ closer to _O(1)_ operation. Combine the two, and worst case scenario of your approach is an _O(2n)_ operatrion, but realsiticly, it'll be closer to _O(n+1)_ – Elias Van Ootegem Oct 28 '13 at 07:40
  • @EliasVanOotegem I've once tested it for a projecteuler problem, and trust me, it was way faster – HamZa Oct 28 '13 at 09:13
  • 1
    @HamZa Big O notation is just a tool to paint a picture of how the curve in a coordinate system (calculation time against size of data) would look like (linear, quadratic, logarithmic, etc). In practice, the `+1` is usually omitted since it has no effect on the shape of the curve – Joren Oct 28 '13 at 09:36