23

I'm surprised by C# compiler behavior in the following example:

int i = 1024;
uint x = 2048;
x = x+i;     // A error CS0266: Cannot implicitly convert type 'long' to 'uint' ...

It seems OK as int + uint can overflow. However, if uint is changed to int, then error disappears, like int + int cannot give overflow:

int i = 1024;
int x = 2048;
x = x+i;     // OK, int

Moreover, uint + uint = uint:

uint i = 1024;
uint x = 2048;
x = x+i;     // OK, uint

It seems totally obscure.

Why int + int = int and uint + uint = uint, but int + uint = long?

What is the motivation for this decision?

Eric J.
  • 147,927
  • 63
  • 340
  • 553
Anton K
  • 4,658
  • 2
  • 47
  • 60
  • Check out Eric Lippert's answer below. He was actually involved in the design of C#, so his is the only answer here to `motivation for this decision` that is authoritative. – Eric J. Oct 15 '14 at 16:02
  • Yeah, it's really cool to have an answer from Eric Lippert. – Anton K Oct 15 '14 at 17:49

6 Answers6

18

Why int + int = int and uint + uint = uint, but int + uint = long? What is the motivation for this decision?

The way the question is phrased implies the presupposition that the design team wanted int + uint to be long, and chose type rules to attain that goal. That presupposition is false.

Rather, the design team thought:

  • What mathematical operations are people most likely to perform?
  • What mathematical operations can be performed safely and efficiently?
  • What conversions between numeric types can be performed without loss of magnitude and precision?
  • How can the rules for operator resolution be made both simple and consistent with the rules for method overload resolution?

As well as many other considerations such as whether the design works for or against debuggable, maintainable, versionable programs, and so on. (I note that I was not in the room for this particular design meeting, as it predated my time on the design team. But I have read their notes and know the kinds of things that would have concerned the design team during this period.)

Investigating these questions led to the present design: that arithmetic operations are defined as int + int --> int, uint + uint --> uint, long + long --> long, int may be converted to long, uint may be converted to long, and so on.

A consequence of these decisions is that when adding uint + int, overload resolution chooses long + long as the closest match, and long + long is long, therefore uint + int is long.

Making uint + int have some more different behavior that you might consider more sensible was not a design goal of the team at all because mixing signed and unsigned values is first, rare in practice, and second, almost always a bug. The design team could have added special cases for every combination of signed and unsigned one, two, four, and eight byte integers, as well as char, float, double and decimal, or any subset of those many hundreds of cases, but that works against the goal of simplicity.

So in short, on the one hand we have a large amount of design work to make a feature that we want no one to actually use easier to use at the cost of a massively complicated specification. On the other hand we have a simple specification that produces an unusual behavior in a rare case we expect no one to encounter in practice. Given those choices, which would you choose? The C# design team chose the latter.

Eric Lippert
  • 647,829
  • 179
  • 1,238
  • 2,067
  • 1
    If the programmer really requires one of the "rare in practice" behaviors (and knows that the actual values will not cause a bug such as integer overflow), they can always force a different overload resolution via explicit casting, correct? – Eric J. Oct 15 '14 at 15:59
  • @EricJ.: Exactly. A cast can be seen as the developer giving a clarification to the compiler: No, *really*, I mean to treat this thing as a signed integer even though it is not. – Eric Lippert Oct 15 '14 at 16:29
13

The short answer is "because the Standard says that it shall be so", which see the informative §14.2.5.2 of ISO 23270. The normative §13.1.2. (Implicit numeric conversions) says:

The implicit numeric conversions are:

...

  • From int to long, float, double, or decimal.
  • From uint to long, ulong, float, double, or decimal.

...

Conversions from int, uint, long or ulong to float and from long or ulong to double can cause a loss of precision, but will never cause a loss of magnitude. The other implicit numeric conversions never lose any information. (emph. mine)

The [slightly] longer answer is that you are adding two different types: a 32-bit signed integer and a 32-bit unsigned integer:

  • the domain of a signed 32-bit integer is -2,147,483,648 (0x80000000) — +2,147,483,647 (0x7FFFFFFF).
  • the domain of an unsigned 32-bit integer is 0 (0x00000000) — +4,294,967,295 (0xFFFFFFFF).

So the types aren't compatable, since an int can't contain any arbitrary uint and a uint can't contain any arbitrary int. They are implicitly converted (a widening conversion, per the requirement of §13.1.2 that no information be lost) to the next largest type that can contain both: a long in this case, a signed 64-bit integer, which has the domain -9,223,372,036,854,775,808 (0x8000000000000000) — +9,223,372,036,854,775,807 (0x7FFFFFFFFFFFFFFF).

Edited to note: Just as an aside, Executing this code:

var x = 1024 + 2048u ;
Console.WriteLine( "'x' is an instance of `{0}`" , x.GetType().FullName ) ;

does not yield a long as the original poster's example. Instead, what is produced is:

'x' is an instance of `System.UInt32`

This is because of constant folding. The first element in the expression, 1024 has no suffix and as such is an int and the second element in the expression 2048u is a uint, according to the rules:

  • If the literal has no suffix, it has the first of these types in which its value can be represented: int, uint, long, ulong.
  • If the literal is suffixed by U or u, it has the first of these types in which its value can be represented: uint, ulong.

And since the optimizer knows what the values are, the sum is precomputed and evaluated as a uint.

Consistency is the hobgoblin of little minds.

Nicholas Carey
  • 71,308
  • 16
  • 93
  • 135
  • 1
    Good answer, though I note that the original quote is actually "A foolish consistency is the hobgoblin of little minds". That "foolish" is I think relevant; there are many consistencies which are important and not foolish, and being concerned with them is not small-mindedness. – Eric Lippert Oct 15 '14 at 16:31
  • Even though Eric Lippert did give his opinion, I think this one is better and more persuasive. Although +1 for both of the explanations. – Anton K Oct 15 '14 at 18:35
  • @AntonK ... well, geez. Thanks. [blushes] – Nicholas Carey Oct 15 '14 at 20:26
12

This is a manifestation of overload resolution for numeric types

Numeric promotion consists of automatically performing certain implicit conversions of the operands of the predefined unary and binary numeric operators. Numeric promotion is not a distinct mechanism, but rather an effect of applying overload resolution to the predefined operators. Numeric promotion specifically does not affect evaluation of user-defined operators, although user-defined operators can be implemented to exhibit similar effects.

http://msdn.microsoft.com/en-us/library/aa691328(v=vs.71).aspx

If you have a look at

long operator *(long x, long y);
uint operator *(uint x, uint y);

from that link, you see those are two possible overloads (the example refers to operator *, but the same is true for operator +).

The uint is implicitly converted to a long for overload resolution, as is int.

From uint to long, ulong, float, double, or decimal.

From int to long, float, double, or decimal.

http://msdn.microsoft.com/en-us/library/aa691282(v=vs.71).aspx

What is the motivation for this decision?

It would likely take a member of the design team to answer that aspect. Eric Lippert, where are you? :-) Note though that @Nicolas's reasoning below is very plausible, that both operands are converted to the "smallest" type that can contain the full range of values for each operand.

Eric J.
  • 147,927
  • 63
  • 340
  • 553
5

I think the behavior of the compiler is pretty logical and expected.

In the following code:

int i;
int j;
var k = i + j;

There is an exact overload for this operation, so k is int. Same logic applies when adding two uint, two byte or what have you. The compiler's job is easy here, its happy because the overload resolution finds an exact match. There is a pretty good chance that the person writing this code expects k to be an int and is aware that the operation can overflow in certain circumstances.

Now consider the case you are asking about:

uint i;
int j;
var k = i + j;

What does the compiler see? Well it sees an operation that has no matching overload; there is no operator + overload that takes an int and a uint as its two operands. So the overload resolution algorithm goes ahead and tries to find an operator overload that can be valid. This means it has to find an overload where the types involved can "hold" the original operands; that is, both i and j have to be implicitly convertible to said type(s).

The compiler can't implicitly convert uint to int because such conversion doesn't exsit. It cant implicitly convert int to uint either because that conversion also doesn't exist (both can cause a change in magnitude). So the only choice it really has is to choose the first broader type that can "hold" both operand types, which in this case is long. Once both operands are implicitly converted to long k being long is obvious.

The motivation of this behavior is, IMO, to choose the safest available option and not second guess the dubious coder's intent. The compiler can not make an educated guess as to what the person writing this code is expecting k to be. An int? Well, why not an uint? Both options seem equally bad. The compiler chooses the only logical path; the safe one: long. If the coder wants k to be either int or unit he only has to explicitly cast one of the operands.

And last, but not least, the C# compiler's overload resolution algorithm does not consider the return type when deciding the best overload. So the fact that you are storing the operation result in a uint is completely irrelevant to the compiler and has no effect whatsoever in the overload resolution process.

This is all speculation on my part, and I may be completely wrong. But it does seem logical reasoning.

InBetween
  • 32,319
  • 3
  • 50
  • 90
  • The answer is good but I want to clarify that the conversion from int to uint isn't bad because it loses *information*. It loses no information; 100% of the bits are preserved after the conversion. You can round-trip from int to uint and back again with no loss of information. The problem is a change of *magnitude*. The int value -1 becomes the uint value four billion and something, a massive change. – Eric Lippert Oct 15 '14 at 15:51
  • @EricLippert whoops, fixed that. Thank you! – InBetween Oct 15 '14 at 18:37
2
i        int i = 1024;
uint x = 2048;
// Technique #1
    x = x + Convert.ToUInt32(i);    
// Technique #2
    x = x + checked((uint)i);
// Technique #3
    x = x + unchecked((uint) i);
// Technique #4
    x = x + (uint)i;
Mark
  • 2,380
  • 11
  • 29
  • 49
  • What is the motivation? – Mark Oct 14 '14 at 19:25
  • Numbers can be manipulated but normalisation isn't, here I just illustrated a way out, figuratively, banana (int) is a fruit(long), apple(uint) is a fruit(long), banana is not apple – Steven Pinel Oct 14 '14 at 19:38
  • 1
    You might want to add your motivation to your answer (instead of in a comment). – Mark Oct 14 '14 at 19:51
2

The numerical promotion rules for C# are loosely based upon those of Java and C, which work by identifying a type to which both operands can be converted and then making the result be the same type. I think such an approach was reasonable in the 1980s, but newer languages should set it aside in favor of one that looks at how values are used (e.g. If I were designing a language, then given Int32 i1,i2,i3; Int64 l; a compiler would process i4=i1+i2+i3; using 32-bit math [throwing an exception in case of overflow] would would process l=i1+i2+i3; with 64-bit math.) but the C# rules are what they are and don't seem likely to change.

It should be noted that the C# promotion rules by definition always select the overloads which are deemed "most suitable" by the language specification, but that doesn't mean they're really the most suitable for any useful purpose. For example, double f=1111111100/11111111.0f; would seem like it should yield 100.0, and it would be correctly computed if both operands were promoted to double, but the compiler will instead convert the integer 1111111100 to float yielding 1111111040.0f, and then perform the division yielding 99.999992370605469.

supercat
  • 77,689
  • 9
  • 166
  • 211