Overflow checking in C# works by using (or not using) overflow checking in CIL.
For example, consider the C# code:
public static int AddInts(int x, int y)
{
return x + y;
}
Without overflow checking, it would be compiled to something like:
.method public hidebysig static int32 AddInts(int32 x, int32 y) cil managed
{
.maxstack 2
IL_0000: ldarg.0
IL_0001: ldarg.1
IL_0002: add
IL_0003: ret
}
With overflow checking, it would be compiled to something like:
.method public hidebysig static int32 AddInts(int32 x, int32 y) cil managed
{
.maxstack 2
IL_0000: ldarg.0
IL_0001: ldarg.1
IL_0002: add.ovf
IL_0003: ret
}
As you can see, the CIL form uses different overflow-checking and non-overflow-checking forms of add
, and the same applies to every operation for which checked
and unchecked
affect behaviour in C#. This would be more convenient in a piece of code that mixed a lot of checked and unchecked operations, but the vast majority of the time one mostly combines checked together or unchecked together in methods, so C#'s approach of having a default for the assembly that is overridden in a block is much more convenient for human coders almost all the time.
What happens to this CIL when it is jitted of course depends upon the processor it is jitted to. A likely consequence is either a similar overflow-checked instruction being used (which would cause an interrupt on overflow, which could be used to produce the exception .NET wants here) or a jump-on-overflow instruction like jo
for x86.
I'm trying to get a better understanding of what this 'performance hit' is all about.
It is true that unchecked
is almost always as fast as checked
and often faster, because ignoring overflow can allow for a more efficient path more often than it prevents one. Simply put, checking for overflow sometimes requires another action, albeit a very fast one at a low-level most of the time, and doing something is almost always slower than doing nothing.* As such in cases where you know overflow can't happen, it does have a slight performance benefit in not checking for that impossible case.
However, this should be considered a secondary feature of unchecked
.
The primary feature, is that it changes what the arithmetic operations mean.
For two 32-bit integers (for example), unchecked(x + y)
means "add x and y and coerce the result into a 32-bit two's-complement number" while checked(x + y)
means "add x and y and return the 32-bit two's-complement number that resulted".
As such, when unchecked(int.MaxValue + int.MaxValue)
returns -2
, that is the right answer, while checked(int.MaxValue + int.MaxValue)
doesn't have a correct answer and hence raises an exception.
They really are different operations, and there are a lot of cases where we want the -2
the first returns.
And as such, our primary concern is "what is the correct answer if we go above the limits of the type?"
- The answer caused by discarding bits is the correct answer, because we are thinking of these values as sets of bits: Use
unchecked
.
- We are thinking of these values as integral numbers representing some count or measure, but their being so high means either some sort of mistake, or quantities we aren't prepared to deal with: Use
checked
.
- We are thinking of these values as integral numbers representing some count or measure, and the math should still be handled correctly for such values: Use
long
or BigInteger
rather than int
so you can handle all possibilities correctly. (Perhaps fast-path with just int
if performance-analysis shows it helps significantly).
- We are thinking of these values as integral numbers, but we're either confident that they'll never be of such magnitude, or can live with that having weird results because it must be a case of "garbage in-garbage out": Strictly this fits a
checked
use, but unchecked
will have the same result, so we can use this for the slight performance gain.
Just how often people deal with each depends on what their programming is for. Probably most programs most often deal with the fourth case (how often do your programs deal with more than a few million anything?), the case where we should semantically be using checked
, but it makes no real difference so we might as well get that unchecked
performance gain.
The first case is very common in a certain class of cases, particularly relatively low-level code; often programs arithmetic fits the fourth case in most of its "business logic" and the first in a lot of the lower-level stuff libraries are dealing with.
When we need some arithmetic that is close to "real world" math and int
can't cut it, then we generally still need the "real" result, so we're in some variant of the third case.
Really, the first case of doing some arithmetic and occasionally saying "sorry, I can't deal with this" is very rarely a desired behaviour; OverflowException
is generally the sort of exception that tells a developer where they have a problem than the sort we catch and then turn into a error message for the user that is helpful to them. As such, most of the time when we have the first case, we thought we had the fourth case, but we were wrong.
For this reason it can be useful to:
- Mark all code that fits the first case (bit-fiddling that should definitely be
unchecked
) as unchecked
even though that would be redundant with the project defaults.
- Mark all code that really should throw a
OverflowException
because you'll do something useful with that (rare) as checked
.
- Use
unchecked
most of the time for that little performance boost, but use checked
with debug either all the time, with occasional runs through your unit tests, or if you've got some strange behaviour. (Just where the balance of caution need be here will depend on the application).
*There's also the possibility of branch misprediction, though it's most likely that the non-overflowing case would be the case most often predicted, and also the case most often hit. Generally, branch misprediction isn't as big a risk here as it might be in other cases, including manually checking for overflow.