3

I am implementing a VM compiler, and naturally, I've come to the point of implementing switches. Also naturally, for short switches, a sequential lookup array would be optimal but what about bigger switches?

So far I've come up with a data structure that gives me a pretty good lookup time. I don't know the name of that structure, but it is similar to a binary tree but monolith, with the difference it only applies to a static set of integers, cannot add or remove. It looks like a table, where value increases to the top and the right, here is an example:

Integers -89, -82, -72, -68, -65, -48, -5, 0, 1, 3, 7, 18, 27, 29, 32, 37, 38, 42, 45, 54, 76, 78, 87, 89, 92

and the table:

-65    3   32   54   92
-68    1   29   45   89
-82   -5   18   38   78
-89   -48  7    37   76

Which gives me the worst possible case width + height iterations. Let's say the case is 37, -65 is less than 37, so move to the right, same for 3 move to the right, same for 32 move to the right, 54 is bigger so move down (step a width since it is a sequential array anyway), 45 is bigger so move down, 38 is bigger so move down and there we have 37 in 7 hops.

Is there any possible faster lookup algorithm?

Also, is there a name for this kind of arrangement? I came up with it on my own, but most likely someone else did that before me, so it is most probably named already.

EDIT: OK, as far as I got it, a "perfect hash" will offer me better THEORETICAL performance. But how will this play out in real life? If I understand correctly a two level "perfect hash" will be rather spread out instead of a continuous block of memory, so while the theoretical complexity is lower, there is a potential penalty of tens if not hundreds of cycles before that memory is fetched. In contrast, a slower theoretical worst case scenario will actually perform better just because it is more cache friendly than a perfect hash... Or not?

Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
dtech
  • 47,916
  • 17
  • 112
  • 190
  • A binary search on a sorted list will find your item in 5 iterations, max. (On this list of 20 items.) Binary search is O(log n). So finding something in a list of a million items will only take 20 iterations. – Jim Mischel Aug 20 '13 at 13:20
  • @JimMischel - regardless of the number of cases? I am using a sorted list, but I need width + height iterations... – dtech Aug 20 '13 at 13:22
  • Why not store the integers as you come across them and use a hash table to find their indices? – Codie CodeMonkey Aug 20 '13 at 13:25
  • Yes, [binary search](http://en.wikipedia.org/wiki/Binary_search_algorithm) will outperform your lookup table. There may be specific numbers in the list for which your lookup table is faster, but overall the binary search will be faster. It will never take more than Log2(n) probes. See the Wikipedia article. – Jim Mischel Aug 20 '13 at 13:29
  • 1
    http://stackoverflow.com/questions/2596320 – BlueRaja - Danny Pflughoeft Aug 20 '13 at 13:39
  • 1
    This is just a diamond-shaped binary tree. If you want faster access, us a balanced tree or a hashtable. If all the numbers are known at compile time, you can even build a perfect hash. – Lee Daniel Crocker Aug 20 '13 at 13:57
  • @LeeDanielCrocker - care to elaborate in an answer or post a link to the "perfect hash"? – dtech Aug 20 '13 at 14:02
  • Google "perfect hash", watch some YouTube lectures. Standard computer science fare. – Lee Daniel Crocker Aug 20 '13 at 14:15
  • Different question: what does your vector of numbers represent? Is there anything you can change about that to make it more regular? – Bas Swinckels Aug 20 '13 at 16:29
  • @BasSwinckels - it is just arbitrary integers. In a real world situation it will represent the user cases in a switch statement, which are context specific. – dtech Aug 20 '13 at 16:33
  • Ok, now I understand. In that case, you need a fast algorithm for the 'compiled' user code **and** a (not necessarily super fast) algorithm to generate that code. Since the switch value can be arbitrary, you will be forced to check if the value is present in the hash table or not, even in case of a 'perfect hash'. For simplicity, I would just go for the binary search, and only update to a hash function if you can show that it is a bottleneck. I guess you will have bigger problems than that (like making it work in the first place), don't waste time on premature optimization! – Bas Swinckels Aug 20 '13 at 17:03
  • The only special case you might want to optimize for is if all the possible switch options are consecutive. In this case, it is straightforward to implement the switch as a [jump table](http://en.wikipedia.org/wiki/Jump_table), which is probably the fastest possible implementation. – Bas Swinckels Aug 20 '13 at 17:15

2 Answers2

2

When implementing switches among a diverse set of alternatives, you have several options:

  • Make several groups of flat lookup arrays. For example, if you see numbers 1, 2, 3, 20000, 20001, 20002, you could do a single if to take you to 1-s or to 20,000-s, and then employ two flat lookup arrays.
  • Discover a pattern. For example, if you see numbers 100, 200, 300, 400, 500, 600, divide the number by 100, and then go for a flat lookup array.
  • Make a hash table. Since all the numbers that you are hashing are known to you, you can play with the load factor of the table to make sure that the lookup is not going to take a lot of probing.

Your algorithm is similar to binary search, in the sense that it's from the "divide an conquer" family. Such algorithms have logarithmic time complexity, which may not be acceptable for switches, because they are expected to be O(1).

Sergey Kalinichenko
  • 714,442
  • 84
  • 1,110
  • 1,523
  • Since `width ~= height ~= sqrt(n)`, OP's algorithm is `O(sqrt(n))`. Others have suggested binary search, which will give you `O(log2(n))`, already better. But a hash-table will give you `O(1)`, so that seems indeed the best way to go. If the list of numbers is fixed and known, you should be able to find some hash function that will give you no collisions. – Bas Swinckels Aug 20 '13 at 13:38
  • @BasSwinckels - I didn't consider a hash table to be faster, got to run a few tests. – dtech Aug 20 '13 at 13:43
  • On the other hand, if the number of entries is fixed and known (and if it cannot change!), then the actual cost can be calculated and the big-O of the algorithm is irrelevant. Hashing is often a good choice, especially if you have a cheap hashing algorithm. – Klas Lindbäck Aug 20 '13 at 13:45
  • @ddriver Hash tables may not necessarily be faster in all cases, only *asymptotically* faster. If you have a good hash function and a good load factor, hash tables are nearly as fast as flat array lookups. Conversely, if you have a poor hashing function that generates lots of collisions, you get awful results closer to a linear search. – Sergey Kalinichenko Aug 20 '13 at 13:48
0

Is there any possible faster lookup algorithm?

Binary search is faster.

Binary search completes in log2(w*h) = log2(w) + log2(h).

Your algorithm completes in w+h.

Klas Lindbäck
  • 33,105
  • 5
  • 57
  • 82