Do bit operations cause programs to run slower?

Question

I'm dealing with a problem which needs to work with a lot of data. Currently its values are represented as an unsigned int. I know that real values do not exceed a limit of 1000.

Questions

I can use unsigned short to store it. An upside to this is that it'll use less storage space to store the value. Will performance suffer?
If I decided to store data as short but all the calling functions use int, it's recognized that I need to convert between these datatypes when storing or extracting values. Will performance suffer? Will the loss in performance be dramatic?
If I decided to not use short but just 10 bits packed into an array of unsigned int. What will happen in this case comparing with previous ones?

Profiling and measurement will answer all your questions! The performance penalty for using smaller-than-native type sizes is pretty compiler/system dependent. — Carl Norum, May 10 '10 at 22:12
Do keep in mind `sizeof(int)` and `sizeof(short)` can be the same, and might have any number of bits greater than or equal to 16. http://stackoverflow.com/questions/271076/what-is-the-difference-between-an-int-and-a-long-in-c/271132#271132 Switching them could potentially change nothing. — GManNickG, May 10 '10 at 22:42
@GMan-Save the Unicorns In my case thay are different: int is 32 bits and short is 16 bits. — flashnik, May 10 '10 at 22:55
@Carl: if the quesiton was "is 1m is enough clearance for a bridge over a highway", would we also have to profile and measure before giving any answer? — peterchen, May 11 '10 at 14:14
@peterchen, are you serious? Those two problems aren't even remotely related. If there were somewhere a standard or even a good way to know analytically what such a performance penalty would be, I'm sure everyone would use it. That is unfortunately not the case - as others have mentioned, it might slow down (because of extra masking and shifting operations, etc.) or it might speed up (more of the data set fits in cache, etc.). Your bridge metaphor doesn't apply to the problem. — Carl Norum, May 11 '10 at 16:35
@Carl, it's far from a perfect (or even good) analogy, but yes, I'm serious. *Common* platforms don't differ that much in the code vs. data size tradeoff, *most* platforms employ significant caching, on *many* platforms memory bandwidth is an early bottleneck. We can't answer "yes" or "no", but we can give parameters under which this makes a significant difference. --- I *am* a little pissed off by "profile and measure" and "premature optimization" becoming weasel answers stopping any further analysis. Please don't take that personally, you *are* right but it's not the whole story. — peterchen, May 12 '10 at 07:44

score 9 · Accepted Answer · answered May 10 '10 at 22:16

9

This all depends on architecture. Bit-fields are generally slower, but if you are able to to significantly cut down memory usage with them, you can even gain in performance due to better CPU caching and similar things. Likewise with short (though it is not dramatic in any case).

The best way is to make your source code able to switch representation easily (at compilation time, of course). Then you will be able to test and profile different implementations in your specific circumstances just by, say, changing one #define.

Also, don't forget about premature optimization rule. Make it work first. If it turns out to be slow/not fast enough, only then try to speed up.

answered May 10 '10 at 22:16

Yes, I know about premature optimization :) The main reason for switching to lower types is using less memory. So if the loss can be not very big it's worth a try. I'll be using it on x64 and x86 architecture. Compilers are MS VC 9 and ICC 11. – flashnik May 10 '10 at 22:37
Potentially reducing working set to 50% or less isn't what Knuth spoke of. All depends on the amount of values, though - bit arithmetics often enough increase the code size more than the data size decreases. – peterchen May 11 '10 at 14:18
In my case typical data is about 1GB, so I hope code won't be large enough to make gain in working set meaningless:) – flashnik May 11 '10 at 19:14
-1 for warning against premature optimization, this is how we end up with slow software – alternative Nov 17 '14 at 03:13

score 2 · Answer 2 · answered May 10 '10 at 22:29

I can use unsigned short to store it.

Yes you can use unsigned short (assuming (sizeof(unsigned short) * CHAR_BITS) >= 10)

An upside to this is that it'll use less storage space to store the value.

Less than what? Less than int? Depends what is the sizeof(int) on your system?

Will performance suffer?

Depends. The type int is supposed to be the most efficient integer type for your system so potentially using short may affect your performance. Whether it does will depend on the system. Time it and find out.

If I decided to store data as short but all the calling functions use int, it's recognized that I need to convert between these datatypes when storing or extracting values.

Yes. But the compiler will do the conversion automatically. One thing you need to watch though is conversion between signed and unsigned types. If the value does not fit the exact result may be implementation defined.

Will performance suffer?

Maybe. if sizeof(unsigned int) == sizeof(unsigned short) then probably not. Time it and see.

Will the loss in performance be dramatic?

Time it and see.

If I decided to not use short but just 10 bits packed into an array of unsigned int. What will happen in this case comparing with previous ones?

Time it and see.

http://stackoverflow.com/questions/271076/what-is-the-difference-between-an-int-and-a-long-in-c/271132#271132 — Martin York, May 10 '10 at 22:59

score 1 · Answer 3 · answered May 11 '10 at 14:11

A good compromise for you is probably packing three values into a 32 bit int (with two bits unused). Untangling 10 bits from a bit array is a lot more expensive, and doesn't save much space. You can either use bit fields, or do it by hand yourself:

(i&0x3FF) // Get i[0]
(i>>10)&0x3FF // Get i[1]
(i>>20)&0x3FF // Get i[2]
i = (i&0x3FFFFC00) | (j&0x3FF)  // Set i[0] to j
i = (i&0x3FF003FF) | ((j&0x3FF)<<10)  // Set i[1] to j
i = (i&0xFFFFF)    | ((j&0x3FF)<<20)  // Set i[2] to j

You can see here how much extra expense it is: a bit operation and 2/3 of a shift (on average) for get, and three bit operations and 2/3 of a shift (on average) to set. Probably not too bad, especially if you're mostly getting the values not setting them.

Do bit operations cause programs to run slower?

Questions

3 Answers3