18

With my compiler, c is 54464 (16 bits truncated) and d is 10176. But with gcc, c is 120000 and d is 600000.

What is the true behavior? Is the behavior undefined? Or is my compiler false?

unsigned short a = 60000;
unsigned short b = 60000;
unsigned long c = a + b;
unsigned long d = a * 10;

Is there an option to alert on these cases?

Wconversion warns on:

void foo(unsigned long a);
foo(a+b);

but doesn't warn on:

unsigned long c = a + b
nbro
  • 15,395
  • 32
  • 113
  • 196
VTiTux
  • 311
  • 1
  • 11
  • 6
    Related to [What happens when a integer overflow occurs in a C expression?](http://stackoverflow.com/q/26195811/1708801) also see [Why must a short be converted to an int before arithmetic operations in C and C++?](http://stackoverflow.com/q/24371868/1708801) – Shafik Yaghmour Jul 27 '15 at 12:06
  • 2
    what is `your compiler`? it looks like an embedded c compiler. better view a embedded toolchain specially – Jason Hu Jul 27 '15 at 12:20
  • If you're not using an embedded compiler, probably you're using Turbo C, which is not a "real" C compiler.Throw it away ASAP – phuclv Jul 27 '15 at 12:56
  • I'm using the C30 compiler for Microchip devices – VTiTux Jul 28 '15 at 11:55

3 Answers3

16

First, you should know that in C the standard types do not have a specific precision (number of representable values) for the standard integer types. It only requires a minimal precision for each type. These result in the following typical bit sizes, the standard allows for more complex representations:

  • char: 8 bits
  • short: 16 bits
  • int: 16 (!) bits
  • long: 32 bits
  • long long (since C99): 64 bits

Note: The actual limits (which imply a certain precision) of an implementation are given in limits.h.

Second, the type an operation is performed is determined by the types of the operands, not the type of the left side of an assignment (becaus assignments are also just expressions). For this the types given above are sorted by conversion rank. Operands with smaller rank than int are converted to int first. For other operands, the one with smaller rank is converted to the type of the other operand. These are the usual arithmetic conversions.

Your implementation seems to use 16 bit unsigned int with the same size as unsigned short, so a and b are converted to unsigned int, the operation is performed with 16 bit. For unsigned, the operation is performed modulo 65536 (2 to the power of 16) - this is called wrap-around (this is not required for signed types!). The result is then converted to unsigned long and assigned to the variables.

For gcc, I assume this compiles for a PC or a 32 bit CPU. for this(unsigned) int has typically 32 bits, while (unsigned) long has at least 32 bits (required). So, there is no wrap around for the operations.

Note: For the PC, the operands are converted to int, not unsigned int. This because int can already represent all values of unsigned short; unsigned int is not required. This can result in unexpected (actually: implementation defined) behaviour if the result of the operation overflows an signed int!

If you need types of defined size, see stdint.h (since C99) for uint16_t, uint32_t. These are typedefs to types with the appropriate size for your implementation.

You can also cast one of the operands (not the whole expression!) to the type of the result:

unsigned long c = (unsigned long)a + b;

or, using types of known size:

#include <stdint.h>
...
uint16_t a = 60000, b = 60000;
uint32_t c = (uint32_t)a + b;

Note that due to the conversion rules, casting one operand is sufficient.

Update (thanks to @chux):

The cast shown above works without problems. However, if a has a larger conversion rank than the typecast, this might truncate its value to the smaller type. While this can be easily avoided as all types are known at compile-time (static typing), an alternative is to multiply with 1 of the wanted type:

unsigned long c = ((unsigned long)1U * a) + b

This way the larger rank of the type given in the cast or a (or b) is used. The multiplication will be eliminated by any reasonable compiler.

Another approach, avoiding to even know the target type name can be done with the typeof() gcc extension:

unsigned long c;

... many lines of code

c = ((typeof(c))1U * a) + b
too honest for this site
  • 12,050
  • 4
  • 30
  • 52
  • 1
    The transition from limits.h to the table of ranges was a little abrupt. Elaborating on the difference: The Standard lists representable ranges. The table of sizes you give is compatible with the Standard. limits.h describes the compiler implementation's limits and it is reasonable to expect that it meets or exceeds the values in your table. – Eric Towers Jul 27 '15 at 18:29
  • 1
    Detail: `(uint32_t)a + b;` may result in a _narrowing_ of `a`. (not in this trivial example though). _In general_, to widen an integer, suggest multiplying by `1` of the type as in `1UL*a + b;` rather than casting. This apraoch will never result in a narrowing of `a`. – chux - Reinstate Monica Jul 27 '15 at 18:56
  • @EricTowers: Thanks, As non-native speaker, I sometimes have to edit my texts multiple times to get a good phrasing. Hope the edit is better now. – too honest for this site Jul 27 '15 at 19:40
  • Seems much better now. Even as a native speaker, I have to edit my texts multiple times to fix phrasing that is fine while typing (with multiple threads of context in mind) and not so much while reading (restricting to only the written context). – Eric Towers Jul 27 '15 at 20:05
  • It may also be worth noting what should be described as a defect in the standard: if `int` is larger than `unsigned short`, but no more than twice as big, a compiler given `unsigned short x = (something); if (x < 46341) do_something(); x*=x;` would be allowed by the C standard to make the call to `do_something` unconditional even if the purpose of the multiplication is to perform arithmetic mod 65536. – supercat Jul 27 '15 at 21:26
  • @supercat: `x *= x` <=> `x = x * x` (semantically). Here `x` is first promoted to `int`, so the result of the multiplication is performed _implementation defined_ (that's why I emphasise on unsigned types). Read [here](http://port70.net/~nsz/c/c11/n1570.html#6.5p4). If you do not want this behaviour, you might `x *= (unsigned)x;` (or use the multiplicative-cast chux used) - To the unaware, C can be pretty nasty. But Modula (or Pascal) apparently lost the race for the better language. – too honest for this site Jul 27 '15 at 23:16
  • @Olaf: On systems where `int` is 32 bits but `unsigned short` is 16 bits (quite typical), then multiplying two `unsigned short` values whose product exceeds 2147483647 is Undefined Behavior (not just Implementation-Defined), authorizing compilers to negate the laws of time and causality. As for C versus Pascal, the C family of programming dialects had the prestige of being a supposed "standard" at the same time as it allowed individual implementations to take advantage of platform-specific features. There were also some things that popular C compilers could do that Pascal compilers couldn't. – supercat Jul 28 '15 at 18:43
  • @Olaf: Pascal had a really good run in the 1980s and into the 1990s; C was a better fit for 1980s hardware than today's hardware, except for floating-point which it basically ruined. Getting pack to the subject at hand: in the Pascal dialects I've used, the largest integer type didn't have an unsigned form, and mixed signed/unsigned operations of any smaller type would promote to the next size up. Pascal regards unsigned numbers as whole numbers; C regards them sometimes as being modular-arithmetic values. IMHO, a good language should offer separate types for both purposes. – supercat Jul 28 '15 at 18:50
  • @supercat: I did not say the opposite, but was a bit imprecise. You are right, it is _undefined behaviour_, but for most actual implementations, the result depends on the underlying CPU and few (if not none) generate a signal on integer overflow (but some might saturate). And: that also applies to 16 bit short and 22 bit int, btw. I understood your comment that you wondered about the result. (Note: Pascal also has an ISO standard. Ada is still updated. However, SO is the wrong place to discuss; it was just an example for stongly typed languages. You could also take Python as a modern example. – too honest for this site Jul 28 '15 at 18:58
6

a + b will be computed as an unsigned int (the fact that it is assigned to an unsigned long is not relevant). The C standard mandates that this sum will wrap around modulo "one plus the largest unsigned possible". On your system, it looks like an unsigned int is 16 bit, so the result is computed modulo 65536.

On the other system, it looks like int and unsigned int are larger, and therefore capable of holding the larger numbers. What happens now is quite subtle (acknowledge @PascalCuoq): Beacuse all values of unsigned short are representable in int, a + b will be computed as an int. (Only if short and int are the same width or, in some other way, some values of unsigned short cannot be represented as int will the sum will be computed as unsigned int.)

Although the C standard does not specify a fixed size for either an unsigned short or an unsigned int, your program behaviour is well-defined. Note that this is not true for an signed type though.

As a final remark, you can use the sized types uint16_t, uint32_t etc. which, if supported by your compiler, are guaranteed to have the specified size.

Bathsheba
  • 231,907
  • 34
  • 361
  • 483
  • Note that if the program had multiplied `a*b` rather than adding them, the result would not be defined on platforms where `int` is larger than `unsigned short` but not more than twice as big. – supercat Jul 27 '15 at 21:27
3

In C the types char, short (and their unsigned couterparts) and float should be considered to be as "storage" types because they're designed to optimize the storage but are not the "native" size that the CPU prefers and they are never used for computations.

For example when you have two char values and place them in an expression they are first converted to int, then the operation is performed. The reason is that the CPU works better with int. The same happens for float that is always implicitly converted to a double for computations.

In your code the computation a+b is a sum of two unsigned integers; in C there's no way of computing the sum of two unsigned shorts... what you can do is store the final result in an unsigned short that, thanks to the properties of modulo math, will be the same.

6502
  • 112,025
  • 15
  • 165
  • 265