12

Using gcc version 4.8.4 on Linux, short is 16 bit and int is 32 bit.

#include "stdio.h"
int main( void ){
  unsigned short u = 0xaabb;
  unsigned int   v = 0xaabb;
  printf ("%08x %08x\n", u, (unsigned short)((u*0x10001)/0x100));
  printf ("%08x %08x\n", v, (unsigned short)((v*0x10001)/0x100));
  return 0;
}

Result:

0000aabb 0000bbab
0000aabb 0000bbaa

This can be varied, e.g., by dividing with 0x10, which produces a similar result (+1) for the first case. The effect does not occur if the byte truncated by /0x100 is less than 0x80. Machine code for the first case (short u) looks as if some rounding (addition of 0xFF) is intended.

  1. What is the reason for the result or is it a bug?
  2. What is the result for other compilers?
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
laune
  • 31,114
  • 3
  • 29
  • 42

5 Answers5

14

A literal like 0x10001 will be of type int (if it can fit inside an int, which is true in this case). int is a signed type.

Since the variable u is a small integer type, it gets integer promoted to int whenever used in an expression.

0xaabb * 0x10001 would supposedly give the result 0xAABBAABB. However, that result is too large to fit inside an int on a 32 bit two's complement system, where the largest number for an int is 0x7FFFFFFF. So you get an overflow on a signed integer and therefore invoke undefined behavior - anything can happen.

Never use signed integers when doing any form of binary arithmetic!

Furthermore, the final cast to (unsigned short) is futile, because printf argument promotes the passed value to int anyhow. Which is strictly speaking incorrect too, because %x means that printf expects an unsigned int.

To avoid all trouble with the unpredictable and limited default integer types in C, use stdint.h instead. Also, using unsigned int literals solves a lot of implicit type promotion bugs.

Example:

#include <stdio.h>
#include <stdint.h>
#include <inttypes.h>

int main( void ){
  uint16_t u = 0xaabb;
  uint16_t v = 0xaabb;
  printf ("%08" PRIx16 " %08" PRIx16 "\n", u, (uint16_t)(u*0x10001u/0x100u));
  printf ("%08" PRIx16 " %08" PRIx16 "\n", v, (uint16_t)(v*0x10001u/0x100u));
  return 0;
}

(This code will have argument promotion too, but by using the PRIx16 format specifier, you tell printf that it is now the compiler's business to make the code work, regardless of what type promotions that might be present in the function call.)

Lundin
  • 195,001
  • 40
  • 254
  • 396
  • Why would 0xAABBAABB not fit inside an int? It is a 32-bit quantity. – Erik Alapää Feb 23 '16 at 14:18
  • @ErikAlapää Because the largest possible number is 0x7FFFFFFF. I've made an edit to clarify. – Lundin Feb 23 '16 at 14:19
  • Right. But I would say, use unsigned for bitwise ops like and, or and bitshifts, but use signed for arithmetic. – Erik Alapää Feb 23 '16 at 14:23
  • 1
    @ErikAlapää: This is wrong advice for arithmetics (Isupport it for bitops, though). Whether you use signed or unsigned depends on what you want to achieve. Here it is perfectly valid. Problems often arise because of mixing them - like here. – too honest for this site Feb 23 '16 at 14:27
  • The final cast isn't futile - it just seems so here. It is the result that should be computed. - Do you mean "bit manipulation" where signed ints are to be avoided? – laune Feb 23 '16 at 14:27
  • @ErikAlapää There almost certainly exists no case where you would want to use signed integers together with hex notation literals. All it will cause is subtle bugs and head ache. This is actually a huge problem for the embedded industry, when naive PC programmers try to write hardware programming C code, with their ints and floats. – Lundin Feb 23 '16 at 14:28
  • You shouldn't use signed numbers when: doing bitwise operations or dealing with hex literals. I suppose a better word might be hexadecimal arithmetic? – Lundin Feb 23 '16 at 14:30
  • To explain the genesis of the snippet would be "too broad". It wasn't coded in C or C++ in the first place, it was merely used to display the behaviour of gcc's backend. – laune Feb 23 '16 at 14:40
  • @Lundin, yes, saying hexadecimal arithmetic is better. And when I program close to hw, I definitely prefer using specific integer types like e.g. uint16_t etc. It is essential to know how wide your vars are. – Erik Alapää Feb 23 '16 at 14:47
  • @Olaf: I think the natural thing for integer arithmetic is to used signed int. As discussed in the comments, arithmetic with hex constants may be better using unsigned. In general, I prefer signed int as my default integer variables, since you can e.g. use -1 in return values etc. – Erik Alapää Feb 23 '16 at 14:51
  • @ErikAlapää: But you are aware `size_t` as returned by the `sizof` operator **is** unsigned, are you? One thing a good C programmer knows is when to use signed and when unsigned. And he is fit with both kinds. Once you move to advanced programming aside the padded armchair of a PC, you might see. – too honest for this site Feb 23 '16 at 14:54
  • @Olaf, I have done a lot of 'advanced programming' on embedded systems. What's with your attitude? Of course size_t is unsigned, and of course unsigned quantities are useful. Here is a good discussion, look at the accepted answer and realize that Stroustrup himself recommends signed int as default. http://stackoverflow.com/questions/18795453/why-prefer-signed-over-unsigned-in-c – Erik Alapää Feb 23 '16 at 14:58
  • @ErikAlapää: Argumentum ad verecundiam is alwas a bad idea. As it shows you have no true argument. As it is useless to discuss a religion, I'll leave it at that. Btw, Stroustrup also mentioned C is a subset of C++. That was is not true for at least >16 years now. – too honest for this site Feb 23 '16 at 15:07
  • So you think it is a bad idea to listen to what the language's creator says? I disagree. And for most practical purposes, C++ IS a true superset of C (C99 has some non-C++ constructs, that is well-known). As for argument, I use ints because I want 2 - 3 to return something useful. – Erik Alapää Feb 23 '16 at 15:12
  • @ErikAlapää: Full-sized unsigned types (which are not smaller than int) behave like members of a wrapping algebraic numbers, rather than numbers. Small signed types (smaller than int) are often required by specification to do likewise, though not quite as consistently. Full-sized signed types and small unsigned types behave like numbers in all defined cases, but their behavior is not always defined. – supercat Feb 23 '16 at 17:55
  • @supercat Yes, for some uses, it is nice to have unsigned types as commutative, finite rings, e.g. since they wrap around nicely. – Erik Alapää Feb 24 '16 at 08:36
  • @ErikAlapää: I wish the Standard would add language to so that small unsigned types used with certain operators will promote to full-sized unsigned types in certain constructs where doing so would yield the same result as signed types in all cases where the latter behavior is specified. As it is, many implementations' small signed types behave more like rings than their small unsigned types. – supercat Feb 24 '16 at 08:49
  • @supercat Yes, I guess the promotion (if type fits in signed int, promote to signed, else unsigned) rules wreak havoc with small unsigned type's ring structure. – Erik Alapää Feb 24 '16 at 08:53
  • @ErikAlapää: On systems with two's-complement integers, the ring structure would hold if compilers simply ignored overflows. Problems only arise when compilers use the fact that certain combinations of operand values would cause overflow to infer that such combinations of values will never occur; I question whether the value of such inferences in cases where arithmetic results would otherwise behave as a ring comes anywhere close to the havoc such inferences can cause with code that expects a ring. – supercat Feb 24 '16 at 09:09
3

Usual arithmetic conversions at play.

u is converted to int before multiplication. Since int is signed it behaves differently on division.

printf("%08x\n", (u*0x10001)/0x100);
printf("%08x\n", (v*0x10001)/0x100);

Returns

ffaabbab
00aabbaa

Strictly speaking multiplication overflow on signed integer is already undefined behaviour, so result is invalid even before the division.

too honest for this site
  • 12,050
  • 4
  • 30
  • 52
user694733
  • 15,208
  • 2
  • 42
  • 68
  • The problem isn't actually the usual arithmetic conversions though. The problem is that the integer literal is of signed type. Simply adding a `u` suffix to the literal will solve the bug, because then "the usual" will come in and play, and balance all types to be of `unsigned int`. – Lundin Feb 23 '16 at 14:35
  • @Lundin: If the unsigned short weren't promoted to a signed type, the signedness of the literal wouldn't matter. – supercat Feb 23 '16 at 17:56
  • @supercat But it is, so you will either have to explicitly cast the unsigned short into a larger unsigned type, or change the type of the literal. The most pedantic, MISRA-ish code would be something like `((uint32_t)u * 0x10001u) / 0x100u` where you have killed every single implicit promotion. The code has now prevented conversion both from integer promotion rule and from the usual arithmetic conversions. – Lundin Feb 25 '16 at 16:01
  • @Lundin: You say "the problem isn't the usual arithmetic conversions"; my point is that it is the way in which such conversions are performed that creates the particular problem at issue. – supercat Feb 25 '16 at 16:30
1

The result of u*0x10001 is int= causing an overflow of signed type and thus undefined behavior.

Mohit Jain
  • 30,259
  • 8
  • 73
  • 100
1

Presuming 16 bit short and 32 bit int (typical for x86, ARM and most other 32 bit systems):

You have two types of undefined behaviour (UB) in your code. First, you use the wrong type-specifiers in the format strings. %x expects an unsigned int, while you pass an unsigned short extended to signed int.

Second - and the one you see here is the first calculation: u is converted to int (integer promotions) - not unsigned int for the multiplication, because the constant 0x10001 is also int. The multiplication invokes UB as it generates signed integer overflow. Once you invoke UB, you are lost and any further interpretation is useless.

Said that, we are now speculating: what happens is that after the multiplication, you likely have a negative value and as the division rounds towards zero (this is a standard requirement), you get the higher negative value. But as you print as unsigned, you see a larger raw (unsigned) value. This because of the 2's complement internal representation of negative values.


Note that this outcome is beyond the C standard. In fact the compiler could generate code to format your hard drive or your computer could jump out of the window or nasal daemons could appear. So, correct the errors:

  • use %hx to print an unsigned short int
  • e.g. use u * 0x10001U to enforce conversion to unsigned int for the multiplication. In general it is recommended to always use the U (unsigned) suffix if you work with unsigned values.
too honest for this site
  • 12,050
  • 4
  • 30
  • 52
1

I extended your code a little bit to explain:

#include "stdio.h"
int main( void ){
  unsigned short u = 0xaabb;
  unsigned int   v = 0xaabb;

  printf ("not casted:\n");
  printf ("%08x %08x\n", u, ((u*0x10001)/0x100));
  printf ("%08x %08x\n", v, ((v*0x10001)/0x100));

  printf ("unsigned short casted:\n");
  printf ("%08x %08x\n", u, (unsigned short)((u*0x10001)/0x100));
  printf ("%08x %08x\n", v, (unsigned short)((v*0x10001)/0x100));

  printf ("u*0x10001:\n");
  printf ("x=%08x d=%d\n", u*0x10001, u*0x10001);

  // Solution
  printf ("Solution:\n");
  printf (">>> %08x %08x\n", u, (unsigned short)((u*0x10001UL)/0x100UL));
  printf (">>> %08x %08x\n", v, (unsigned short)((v*0x10001UL)/0x100UL));
  return 0;
}

This leads to the following output:

not casted:
0000aabb ffaabbab
0000aabb 00aabbaa
unsigned short casted:
0000aabb 0000bbab
0000aabb 0000bbaa
u*0x10001:
x=aabbaabb d=-1430541637
Solution:
>>> 0000aabb 0000bbaa
>>> 0000aabb 0000bbaa

So what you are seeing that the operation u*0x10001 will generate an signed int (32 Bit) value and due to this your result is d=-1430541637. If you divide this value with 0x100 you will get the result you got 0xFFAABBAB. If you are casting this value with unsigned short as you did, you get your result = 0x0000BBAB. If you want to prevent this, that the compiler uses unsigned values for this operation you have to write UL as an extension to the numbers.

So you see the compiler is working as expected. You can compile it by yourself here Code[^].

Frodo
  • 749
  • 11
  • 23