1

What can I do to improve large switches and if-elses speed manually? I will probably need some kind of hash or lookup table.

I'm working with gcc and C code, I doubt that gcc has any inbuilt optimizations for this.

Edit: My switch code is what every switch looks like, do something based on if a particular int is some value. My if-elses look like this:

if( !strcmp( "val1", str ) )
foo();
else if( !strcmp( "val2", str ) )
foo2();
...

I also have ifs that do this

if( struct.member1 != NULL )
foo();
if( struct.member2 != NULL )
foo2();

EDIT2: Thank you everyone. I'm not sure which one I should pick as an answer, because a lot of these answers have valid points and valuable insights. Unfortunately, I have to pick just one. But thanks all! In the end, using a perfect hash table seems the best way to get O(n) time on the access for both ifs and switches.

jetru
  • 1,964
  • 4
  • 16
  • 24
  • let's say my if conditions just have a simple string comparisonevery condition if( !strcmp(a, "abc") ) else if(...)... – jetru Oct 31 '10 at 18:31
  • How large is large ? Do you have 10 strings to compare ? 100 ? 100 000 ? – nos Oct 31 '10 at 18:39
  • For me, it's ~30. But I'd be interested to know approaches for larger figures as well. – jetru Oct 31 '10 at 18:40
  • In your second example (!= NULL) can there be more than one member that satisfies the condition? Can you give a bit more information about that scenario because the way it is presented I'm not sure it can be sped up. How and when are those members assigned values? Why not simply assign a function pointer to foo or foo2 at that point? (or maintain a list of functions to call) – Guy Sirton Oct 31 '10 at 19:25
  • It is not so much a performance problem, as trying to improve something that can be improved. – jetru Nov 01 '10 at 11:58

9 Answers9

2

To use a hash table:

  1. Pick a hash function. This one is a biggie. There are tradeoffs between speed, the quality of the hash, and the size of the output. Encryption algorithms can make good hash functions. The hash function performs some computation using all the bits of your input value to return some output value with a smaller number of bits.
  2. So the hash function takes a string and returns an integer between 0 and N .
  3. Now you can look up a pointer to a function in a table of size N.
  4. Each entry in the table will be a linked list (or some other searchable data structure) because of the chance of collision, that is two strings that map to the same hash value.

E.g.

lets say hash(char*) returns a value between 0 and 3.
hash("val1") returns 2
hash("val2") returns 0
hash("val3") also returns 0
hash("val4") returns 1

Now your hash table looks something like:

table[0] ("val2",foo2) ("val3", foo3)
table[1] ("val4",foo4)
table[2] ("val1",foo1)
table[3] <empty>

I hope you can see how the cost of doing matching using a hash table is bound by the time it takes to calculate the hash function and the small time it takes to search the entry in the hash table. If the hash table is large enough most hash table entries would have very few items.

Guy Sirton
  • 8,331
  • 2
  • 26
  • 36
  • If the set of values is known in advance, you should not have any collisions. Simply choose a perfect hash, or if you're lazy make the hash size a bit larger than the number of values so it's easy (and some slots are left empty). Not having to test multiple values after computing the hash really simplifies your code a lot (not to mention speeds it up), so you should always use a hash with no collisions when possible. – R.. GitHub STOP HELPING ICE Nov 01 '10 at 22:03
2

For strings, if you have a small finite number of possible strings, use a perfect hash and switch on the result. With only 30 or so strings, finding a perfect hash should be pretty easy. If you also need to validate the input, you'll have to do a single strcmp in each case, but that's pretty cheap.

Beyond that, just let the compiler optimize your switches. Only do anything fancier if you've done sufficient testing to know the time spent here is performance-critical.

R.. GitHub STOP HELPING ICE
  • 208,859
  • 35
  • 376
  • 711
1

I'm not sure what are you looking for, but branch prediction with gcc is discussed in this question

Community
  • 1
  • 1
Drakosha
  • 11,925
  • 4
  • 39
  • 52
  • No, I'm not looking for probability based optimizations like that one. I want to improve the general time complexity – jetru Oct 31 '10 at 18:39
1

It has. Just see the generated code. At least it optimizes switches.

You may use hash table to optimize your code, but I'm sure that GCC does the same for you.

Another thing is if-else's, when they contain some complex boolean expressions. I will not answer this part of question here.

Lavir the Whiolet
  • 1,006
  • 9
  • 18
  • Please read the edit. How can I use a hash table for code like that? – jetru Oct 31 '10 at 18:50
  • 1
    @jetru: you can use a hashtable that maps a string (char[]) to function pointers – Lie Ryan Oct 31 '10 at 18:55
  • @jetru: also, for the second case, you're screwed up with linear search; a probability-based optimization (Profile-Guided Optimization) is probably the best you can do – Lie Ryan Oct 31 '10 at 18:57
  • @jetru: Strings are easily hashable. There are a bunch of algorithms, just google them (use Java's one, for example). What about second case - I don't know, but probably there also is some hash algorithm for that. – Lavir the Whiolet Oct 31 '10 at 19:19
1

It really depends on the code base you are working with and whether it is open to further/better modularization. Otherwise, if nothing else I can recommend this.

If there are more common cases than others (one or two things happen more than the rest), place them at the beginning of the switch/if/else, that way in the more common cases your program will only make that first one or two comparisons and short circuit its path. Generally a good idea on its own for any code IMHO.

gnomed
  • 5,483
  • 2
  • 26
  • 28
1

It depends much on the strings that you are comparing. You could do a switch on some characteristics of the strings:

  • If you know that they differ pretty well in the 4th position, you could do a switch on str[3] and only then do the strcmp.
  • Or look at some sort on checksum and switch.

But all of this is quite handcrafted, you definitely should check the assembler that gcc produces.

Jens Gustedt
  • 76,821
  • 6
  • 102
  • 177
1

A hash table would be ideal for speeding up a bunch of string compares. You might want to look into a string library that does not use nul terminated strings like the C stdlib does. Lots of string manipulation in C involves a lot of "look through the string for the nul, then do your operation". A string library like SafeStr keeps info about the length of the strings so there's no need to burn time to scan for nuls, especially for strings with unequal lengths

JimR
  • 15,513
  • 2
  • 20
  • 26
1

(I'm quoting some from my prior research I've written on this topic)
The specINT2006 benchmark, 458.sjeng, which implements a chess simulator, uses many switch statements to process the different chess pieces. Each statement is in a form like:

switch (board[from]) {  
case (wpawn): ...  
case (wknight): ... 

Which the compiler (gcc) generates as an instruction sequence similar to the following:

40752b: mov -0x28(%rbp),%eax
40752e: mov 0x4238(,%rax,8),%rax
407536: jmpq *%rax

This assembly acts as a lookup table. You can speed up the compiled code further by splitting your switch ... case into multiple switch statements. You'll want to keep the case values consecutive and put the most frequent cases into different switch statements. This particularly improves the indirect branch prediction.

I'll leave the remainder of your questions to others.

Brian
  • 2,693
  • 5
  • 26
  • 27
1

Other answers have already suggested using a hash table, I'd recommend generating a perfect hash function using gperf (or a minimal perfect hash function, see the wikipedia page for a few links)

Hasturkun
  • 35,395
  • 6
  • 71
  • 104