how does c compiler handle unsigned and signed integer? Why the assembly code for unsigned and signed arithmetic operation are the same?

Question

I am reading the book: CS-APPe2. C has unsigned and signed int type and in most architectures uses two's-complement arithmetic to implement signed value; but after learning some assembly code, I found that very few instructions distinguish between unsigned and signed. So my question are:

Is it the compiler's responsibility to differentiate signed and unsigned? If yes, how does it do that?
Who implements the two's-complement arithmetic - the CPU or the complier?

Add some more info:

After learning some more instructions, actually there are some of them differentiate between signed and unsigned, such as setg,seta,etc. Further, CF and OF apply to unsigned and respectively. But most integer arithmetic instructions treat unsigned and signed the same,e.g.

int s = a + b

and

unsigned s = a + b

generate the same instruction.

So when executing ADD s d, should the CPU treat s&d unsigned or signed? Or it is irrelevant, because the bit pattern of both result are the same and it is the compiler's task to convert the underlying bit pattern result to unsigned or signed?

P.S i am using x86 and gcc

"C has unsigned and signed int type and use two's-complement arithmetic to implement signed value. We all know it." - and y'all know it incorrectly. 2's complement is popular but not universal. The C standard permits an implementation to use 1's complement or sign+magnitude representation for signed integers. — , Oct 19 '13 at 08:49
As to the questions: 1. yes, by looking at the types of the variables and emitting different assembly when necessary; 2. the CPU. — , Oct 19 '13 at 08:50
Difference between unsigned and signed numbers in lower level will only manifest itself when such numbers are extended or truncated (sext/zext operations and alike). Arithmetics for 1's complement numbers is the same for signed and unsigned. — SK-logic, Oct 19 '13 at 08:54
@H2CO3,so can we say that from the CPU's perspective all integers are signed but the complier differentiate them? — tomwang1013, Oct 19 '13 at 08:56
In addition to SK-logic: look at division and comparisons and you will find differences between signed and unsigned. From the cpu's perspective any data is just a set of bits or bytes. — Bryan Olivier, Oct 19 '13 at 08:57
@user1446907 No, CPUs tend to have both signed and unsigned operations. But again, this is CPU-specific. Some do, some don't. — , Oct 19 '13 at 09:00
@SK-logic: That was more answer than comment - I suggest you post it as such - especially if you can come up with some worked examples. — Clifford, Oct 19 '13 at 09:01
http://www.swarthmore.edu/NatSci/echeeve1/Ref/BinaryMath/NumSys.html — David Ranieri, Oct 19 '13 at 09:06
On 2: irrelevant. Just like there are CPUs without native support for floating point arithmetic, imagine there is a CPU without "signed/unsigned" operations. The compiler needs to work around that, or it's not a good C compiler. — Jongware, Oct 19 '13 at 09:53
@H2CO3, i add some example, could you read my question and help answer again? — tomwang1013, Oct 19 '13 at 10:50
I believe there is some checking of overflow flags that happens after the actual addition - if overflow occurs, the unsigned result is well defined but the unsigned one is not. See http://stackoverflow.com/q/16056758/1967396 — Floris, Oct 19 '13 at 10:57
@Floris Overflow on signed numbers is undefined behavior, so the compiler can do anything it likes, for example just wrap around. There is no obligation to check for overflow and most compilers don't. — Bryan Olivier, Oct 19 '13 at 12:53
@BryanOlivier - according to links in the question I linked earlier, the C standard permits overflow in unsigned integers, and will treat them `modulo` (1+the largest number that can be represented) - in other words, numbers wrap (but only for unsigned integers). Do you have documentation that that is not so? — Floris, Oct 19 '13 at 14:40
@Floris Overflow on unsigned numbers is indeed well defined in the C standard. On signed numbers however it is undefined behavior, but in practice most compilers will still do the module (wrap around). As a result the code for signed and unsigned is still the same. — Bryan Olivier, Oct 19 '13 at 17:41
@BryanOlivier - I didn't realize that. So it's only when you convert the number to string (for printing etc) or float/double (to do floating point arithmetic), that you really need to worry about the signed/unsigned nature of the number. — Floris, Oct 19 '13 at 18:41
@Floris That is what this question is all about. For many operations there is no difference, but for many others there is: converting to a bigger size integral type, division, shift right, comparison etc. — Bryan Olivier, Oct 20 '13 at 08:38
@BryanOlivier: for your examples, you get different *instructions* for signed and unsigned values (and the CPU still doesn't 'know'). F.i., "converting to a bigger type": promoting unsigned char `al` to int `ebx` is `xor ebx,ebx; mov bl,al`. The same for a signed char: `movsx ebx, al`. — Jongware, Oct 20 '13 at 10:27
Note that `unsigned s = a + b` and `int s = a + b` **are** the same operation apart from the possible conversion of the result. In C, the type of the destination, `s`, does not influence the type of the computation, `a + b`, which is determined only from the types of `a` and `b`. — Pascal Cuoq, Oct 20 '13 at 10:29
@Jongware That is what I meant to say. All the examples I mention should give different instructions. — Bryan Olivier, Oct 20 '13 at 10:38
Possible duplicate of [Which arithmetic operations are the same on unsigned and two's complement signed numbers?](https://stackoverflow.com/questions/21475286/which-arithmetic-operations-are-the-same-on-unsigned-and-twos-complement-signed) — phuclv, Aug 13 '18 at 16:49

Clifford · Answer 1 · 2013-10-19T20:08:47.393

In many cases there is no difference at the machine level between signed and unsigned operations, and it is merely a matter of interpretation of the bit pattern. For example consider the following 4-bit word operation:

Binary Add  Unsigned   2's comp
----------  --------   --------
  0011          3         3
+ 1011       + 11       - 5
-------     --------   --------
  1110         14        -2  
-------     --------   --------

The binary pattern is the same for the signed and unsigned operation. Note that subtraction is merely the addition of a negative value. When a SUB operation is performed, the right-hand operand is two's complemented (invert bits and increment) then added (the ALU circuit responsible is an adder); not at the instruction level you understand, but at the logic level, although it would be possible to implement a machine without a SUB instruction and still perform subtraction albeit in two instructions rather than one.

There are some operations that do require different instructions depending on the type, and it is the compiler's responsibility to generate appropriate code generally speaking - architectural variations may apply.

"When a SUB operation is performed, the right-hand operand is complemented then added" -- I think you missed a correction of one there. *Complement* usually refers to the bitwise complement (`~x`), and for two's complement, that is not the same operation as negating it. — , Oct 19 '13 at 09:54
@clifford, i add some example, could you read my question and help answer again? — tomwang1013, Oct 19 '13 at 10:49

Pavel Šimerda · Accepted Answer · 2014-04-14T07:19:50.177

It's quite easy. Operations like addition and subtraction don't need any adjustment for signed types in two's complement arithmetic. Just perform a mind experiment and imagine an algorithm using just the following mathematical operations:

increment by one
decrement by one
compare with zero

Addition is just taking items one by one from one heap and putting them to the other heap until the first one is empty. Subtraction is taking from both of them at once, until the subtracted one is empty. In modular arithmetics, you just just treat the smallest value as the largest value plus one and it works. Two's complement is just a modular arithmetic where the smallest value is negative.

If you want to see any difference, I recommend you to try operations that aren't safe with respect to overflow. One example is comparison (a < b).

Is it the complier's responsibility to differentiate signed and unsigned? If yes, how does it do that?

By generating different assembly whenever needed.

Who implements the two's-complement arithmetic - the CPU or the complier?

It's a difficult question. Two's complement is probably the most natural way to work with negative integers in a computer. Most operations for two's complement with overflow are the same as for unsigned integers with overflow. The sign can be extracted from a single bit. Comparison can be conceptually done by subtraction (which is signedness-agnostic), sign bit extraction and comparison to zero.

It's the CPU's arithmetic features that allow the compiler to produce computations in two's complement, though.

unsigned s = a + b

Note that the way plus is computed here don't depend on the result's type. Insead it depends on the types of variables to the right of the equal sign.

So when executing ADD s d, should the CPU treat s&d unsigned or signed?

CPU instructions don't know about the types, those are only used by the compiler. Also, there's no difference between adding two unsigned numbers and adding two signed numbers. It would be stupid to have two instructions for the same operation.

I think you meant to say: "CPU instructions **don't** know about the types". — Bryan Olivier, Oct 20 '13 at 08:46
@paval, so in `s = a + b`, it is the complier's task to convert the bit result of `a + b` to the type of s, i.e. unsigned or signed. E.g. say that the bit result of `a + b` is 1001(assume integer is 4 bit), so if s is unsigned, the compiler treat 1001 as 9, otherwise, it is treated as -7. All these conversion are done by the compiler, NOT the CPU. Am i right? — tomwang1013, Oct 20 '13 at 08:49
@user1446907 Nope. The compiler doesn't and cannot possibly interpret runtime values as it's not running at that time. It merely decides which instructions will be used to perform operations on those values. Only literals (numbers written directly to the source file) are interpretted by the compiler but those naturally default to signed. — Pavel Šimerda, Oct 20 '13 at 10:18
Fixed: „It's the CPU's arithmetic features that allow the compiler to produce computations in two's complement, though.“ It was wrong as well as confusing the matter. — Pavel Šimerda, Oct 20 '13 at 10:20
@PavelŠimerda 'conversion' is not suitable, i should say "make the compiler treat or interpret a + b as the same type of s by s = a + b". — tomwang1013, Oct 21 '13 at 09:39
@user1446907: You originally wanted to affect the meaning of the plus sign, which can't be done by converting/assigning the result, as at that time the addition is already over. — Pavel Šimerda, Oct 21 '13 at 14:39

score 1 · Answer 3 · answered Oct 20 '13 at 12:08

1

There's no need to differentiate signed and unsigned ints for most arithmetic/logic operations. It's often only need to take the sign into account when printing, zero/sign extending or comparing values. In fact the CPU doesn't know anything about a value's type. A 4-byte value is just a series of bits, it doesn't have any meaning unless the user points out that's a float, an array of 4 chars, an unsigned int or signed int, etc. For example when printing a char variable, depending on the type and output properties indicated, it will print out the character, unsigned integer or signed integer. It's the programmer's responsibility to show the compiler how to treat that value and then the compiler will emit the correct instruction needed to process the value.

answered Oct 20 '13 at 12:08

phuclv

37,963
15
156
475

I almost get it: although `a + b` output the same bit result, we can tell the compiler to treat it as unsigned or signed by `s = a + b` through s's type. Or by a print statement: `print("%xxx", a + b)`. – tomwang1013 Oct 21 '13 at 09:35
1

Yes, that's it. But passing to printf a different type than the format may invoke undefined behavior. It should be type-casted to the desired type first, but that's merely a type cast, the bit pattern is still unchanged – phuclv Oct 21 '13 at 11:37
@user1446907: It's technically wrong. You're not telling the compiler how to treat it (it knows it from the type of `a + b`), you're telling it to generate a sequence of instruction to convert it form the expression type to the variable type. – Pavel Šimerda Oct 21 '13 at 14:44
@PavelŠimerda thanks, your explanation is more concise and meanful, though i still have many questions. I need to learn more. – tomwang1013 Oct 21 '13 at 15:17
@user1446907: Just ask. It helps me to sort out my knowledge as well. – Pavel Šimerda Oct 21 '13 at 22:12

score 1 · Answer 4 · answered Oct 26 '16 at 18:03

A lot has been said about your first question, but I like to say something about your second one:

Who implements the two's-complement arithmetic - the CPU or the complier?

The C standard does not require negative numbers to have two's-complement, it doesn't define at all how hardware expresses negative numbers. The task of the compiler is it to translate your C code into CPU instructions that do what you your code requests. So whether the C compiler will create code for two's-complement arithmetic or not depends solely on the fact if your CPU uses two's-complement arithmetic or not. The compiler must know how the CPU works and create code accordingly. So the correct answer to that question is: The CPU.

If your CPU was using a one's-complement representation, than a C compiler for that CPU would emit one's-complement instructions. On the other hand, a C compiler can emulate support for negative numbers on a CPU that doesn't know negative numbers at all. As the two's-complement allows you to ignore if a number is signed or unsigned for many operations, this is not too hard to do. In that case it would have been the compiler implementing the two's-complement arithmetic. This could also be done on a CPU that has an representation for negative numbers, yet why should the compiler do that and not just use the native form the CPU understands? So it won't do that unless it has to.

*at all* is a slight overstatement. C does specify that an implementation will use either 2's complement, 1's complement, or sign/magnitude encoding for signed integers. (But the object representation is allowed to contain padding.) I think this guarantees that the place-values of the bits are in the normal order (modulo endianness), if you were to look at the object representation using unsigned char (which is guaranteed to be normal binary with no padding). i.e. so we can rule out having the sign bit be in the middle, with other bits on either side. Or any other esoteric encoding. — Peter Cordes, Dec 05 '18 at 00:32
@PeterCordes See 2nd paragraph. The C standard says that numbers in C can be either, or, or, but it doesn't say "C cannot run on hardware where this is not the case". If you have hardware that uses a different representation, it can still support a C compiler, but that compiler must then choose one of the representations you mentioned for storage and for calculations either do them in software or convert to the native format, do the calculation and convert back to the chosen storage format, probably whatever is faster. In that rare case, the implementation would define it, that's correct. — Mecki, Dec 05 '18 at 20:14

plugwash · Answer 5 · 2023-05-03T19:46:23.050

It is the hardware's job to provide a useful set of instructions. It is the compiler's job to translate the abstract high level language to those instructions.

The hardware stores and manipulates patterns of bits. Meaning is only assigned to a pattern of bits when we perform an operation on it. If we perform a "signed divide" operation on two registers we are telling the hardware or library function to assume the bits in those registers represent signed numbers. If we perform an addition operation on two registers we are telling the hardware to assume those bit patterns represent numbers, we are not however telling it if those numbers are signed or unsigned. The hardware has no reason to know or care about that.

Two's complement is the most popular format for signed integers because it requires the least additional work from the hardware. In particular if addition, subtraction and non-widening multiplication are implemented with modulo 2ⁿ wraparound (which is the easiest way to implement them) then those same operations can be used for both signed and unsigned arithmetic.

That said, there are operations that are different for signed and unsigned numbers and the compiler or assembler programmer needs to be aware of them.

Comparison and overflow detection

Equality comparison is the same in signed and unsigned arithemtic. However inequality comparisons are not. 255 is after all greater than zero while -1 is smaller than zero. Similarly overflow detection is different, the calculation of 255+255 in 8 bit unsigned arithmetic would overflow, but the calculation of (-1) + (-1) in 8 bit signed arithmetic would not.

However most CPUs do not have dedicated signed and unsigned comparison operations. Instead most CPUs have a set of flags where some flags are meaningful for unsigned operations and different flags are meaningful for signed operations.

The flags may be set as a result of a regular arithmetic operation. Most CPUs also have a dedicated compare operation which effectively performs a subtraction and sets the flags but discards the result.

For example the set of flags from arm32 is (and these are fairly typical).

Z: zero, used to detect if the result was zero (for a compare instruction this means the operands were equal)
C: Carry, used when constructing larger arithmetic operations out of smaller ones. Also used to detect overflow in unsigned arithmetic.
V: Overflow, Set when a signed arithmetic operation overflows.
N: Negative, Set if the most significant bit of the result is set. That is set when the result of a signed arithmetic operation is negative.

The CPU does not know or care whether the user is performing a signed or unsigned operation, it always sets all the flags based on a set of simple rules. It is up to the compiler (or programmer if writing in assembler) to know what flags are meaningful for their operation and choose an appropriate conditional instruction.

The conditional instructions in turn look at various combinations of these flags. Conditional instructions intended for use in code working on unsigned values look at the C and Z flags, while conditional instructions intended for use in code working on signed values look at the V, Z and N flags.

Widening

Converting a number from a smaller type to a larger one is known as widening. For unsigned numbers the extra bits must be filled with zero. For signed numbers the extra bits must be filled with copies of the sign bit.

Some architectures may have dedicated instructions for widening or may incorporate widening functionality into other instructions. For example on 32-bit arm all arithmetic instructions work on 32-bit words. On the original arm CPU the "load byte" instruction always zero-extended the byte. However with armv4 additional load instructions were added, so signed bytes and signed and unsigned halfwords could be loaded and extended in a single operation.

Similarly recent versions of 32-bit arm have specific sign extension instructions which discard the upper part of a register and replace it's contents with a sign or zero extension of the lower part.

In other cases, widening may be performed using more generic instructions. This may include shift instructions and bitwise operations. For example extending a single word value into a double word value may use a shift by one bit less than the word size (e.g. 31 for a 32-bit word size) to construct a new word consisting entirely of duplicates of the first words signed bit.

Widening multiply

A widening multiply typically produces a result twice as big as it's arguments. For example a 32x32 widening multiply would normally produce a 64-bit result. In some cases there may be a single instruction that performs the complete widening multiply, in others a "multiply high" instruction may be provided that only provides the top part of the result.

Widening multiply is useful in a number of applications including fixed-point arithmetic and as a building block to perform arithmetic on numbers wider than the word size.

When processors offer widening multiply they usually offer separate versions of it for signed and unsigned operation.

bit shifts

Bit shifts can be used to multiply or divide numbers by powers of two.

Shifting "left" to make a number larger is the same for both signed and unsigned numbers. However shifting "right" to make a number smaller is different, for unsigned numbers the extra places should be filled with zeros, while for signed numbers they should be filled with copies of the signed bit.

Most CPUs offer separate right shift instructions for signed and unsigned operation. These are known (presumably for historical reasons) as "logical shift" (for unsigned operations) and "arithmetic shift" for signed operations.

Be aware that some higher level languages, particularly C and older versions of C++ leave the behaviour of shift implementations on negative numbers implementation defined.

Division

Division is different for signed and unsigned numbers as can be readily seen. (-2)/(-1) = 2 but 254 / 255 = 0.

In my experience, many CPUs do not offer any division instructions at all, but when division instructions are provided, usually both signed and unsigned versions are available.

KawaiKx · Answer 6 · 2013-10-21T15:05:28.047

This bothered me too for a long time. I didn't get to know how compiler works as a program while handing its defaults and implicit instructions. But my search for an answer led me to following conclusions :

Real world uses signed integers only, since the discovery of negative numbers. that is the reason int is considered a signed integer by default in the compiler. I totally ignore the unsigned number arithmetic since it is useless.

CPU has no clue of signed and unsigned integers. It just knows bits - 0 and 1. how you interpret its output is up to you as an assembly programmer. That makes assembly programming tedious. Dealing with integers (signed & unsigned) involved lot of flag-checking. That is why high-level languages were developed. compiler takes all the pain away.

How compiler works is a very advance learning. I accepted that at present it is beyond my understanding. This acceptance helped me to move on in my course.

In x86 architecture:

add and sub instructions modify flags in the eflags register. These flags can then be used in conjunction with adc and sbb instructions to build arithmetic with higher precision. In such case, we move the size of the numbers into ecx register. The number of times loop instruction is carried out is the same as the size of the numbers in bytes.

Sub instruction takes the 2's complement of the subtrahend, add it to the minuend, invert the carry. This is done in hardware (implemented in circuit). Sub instruction 'activates' a different circuit. After using the sub instruction, programmer or compiler checks the CF. If it is 0, the result is positive and the destination has correct result. If it is 1, the result is negative and the destination has the 2's complement of the result. Normally, the result is left in 2's complement and read as a signed number, but the NOT and INC instructions can be used to change it. The NOT instruction performs the 1's complement of the operand, then the operand is incremented to get the 2's complement.

When a programmer has planned to read the result of an add or sub instruction as a signed number, he shall be watch the OF flag. If it is set 1, the result is wrong. He should sign-extend the numbers before running the operation between them.

I'm tempted to give a +1 (not that you actually needed it), I'd just need some information (e.g. about sub, and correction) to be backed by easily understandable but reliable sources. — Pavel Šimerda, Oct 21 '13 at 22:17
@PavelŠimerda I learnt about sub instruction from this book : http://www.amazon.com/x86-PC-Assembly-Language-Interfacing/product-reviews/0135026482/ref=dp_top_cm_cr_acr_txt?ie=UTF8&showViewpoints=1 the book is old and it is about assembly programming in windows inside an emulator. — KawaiKx, Oct 22 '13 at 00:03
Yes CF tells you whether an unsigned subtraction wrapped, but it can wrap so far that it's not in the `-2^31 .. 2^31-1` range of signed 2's complement anymore. e.g. `sub eax, ecx` with eax = 0, ecx = 0xffffffff` will leave EAX=1, CF=1. `-0xffffffff` = -4294967295 requires 33 bits to represent in 2's complement, and `+1` is merely the low 32 of those bits. — Peter Cordes, Jan 28 '19 at 02:25
Sure in some cases, subtraction of unsigned inputs will leave a correct 2's complement signed result, but you have to check more than CF to find out if that happened. — Peter Cordes, Jan 28 '19 at 02:26

score 0 · Answer 7 · answered Dec 04 '18 at 11:10

2's complement is just a map between decimal and binary number.

The compiler is implementing this mapping by translating literal number to corresponding binary, for example -3 to 0xFFFFFFFD(as can be seen in disassembly), and producing machine code that is consistent with 2's complement representation. For example, when it tries to execute 0-3, it should choose a instuction that should produce 0xFFFFFFFD by taking 0x00000000 and 0x000000003 as arguments.

The reason it chooses SUB which is same for unsigned subtraction, is it simply produces 0xFFFFFFFD as expected. No need to ask CPU to provide special SUB for signed subtraction. To say that second operand is inversed by 2's complement and by this deduce the conclusion that CPU implements 2's complement in SUB is unfair. Because borrowing from higher bit in subtraction happens to be the same as 2's complement inversing and SUB is also used for unsigned subtraction, no need to involve the concept of 2's complememt in SUB at all.

The following disassembly illustrate the fact that signed subtraction use same SUB as unsigned.

//int32_3 = -3;
010B2365  mov         dword ptr [int32_3],0FFFFFFFDh  
//int32_1 = 0, int32_2 = 3;
010B236C  mov         dword ptr [int32_1],0  
010B2373  mov         dword ptr [int32_2],3  
//uint32_1 = 0, uint32_2 = 3;
010B237A  mov         dword ptr [uint32_1],0  
010B2384  mov         dword ptr [uint32_2],3  
//int32_3 = int32_1 - int32_2;
010B238E  mov         eax,dword ptr [int32_1]  
010B2391  sub         eax,dword ptr [int32_2]  
010B2394  mov         dword ptr [int32_3],eax  
//uint32_3 = uint32_1 - uint32_2;
010B2397  mov         eax,dword ptr [uint32_1]  
010B239D  sub         eax,dword ptr [uint32_2]  
010B23A3  mov         dword ptr [uint32_3],eax

The CPU preserve additional imformation in CF and OF flags for further instructions that use result of SUB in different ways depending on the variable type that is assigned the result.

The following disassembly illustrate how compiler generates different instructions for signed compare and unsigned compare. Notice cmp includes a internal sub, and jle is based on OF flag and jbe is based on CF flag.

//if (int32_3  > 1)  int32_3 = 0;
010B23A9  cmp         dword ptr [int32_3],1  
010B23AD  jle         main+76h (010B23B6h)  
010B23AF  mov         dword ptr [int32_3],0  
//if (uint32_3 > 1) uint32_3 = 0;
010B23B6  cmp         dword ptr [uint32_3],1  
010B23BD  jbe         main+89h (010B23C9h)  
010B23BF  mov         dword ptr [uint32_3],0

It is the OF gives away the fact that CPU implements 2's complement, because the way the OF is set is when middle binary number 0x10000000 or 0x0FFFFFFF is exceeded. And 2's complement representation maps 0x10000000 to -268435456 and 0x0FFFFFFF to 268435455, which are the upper and lower limit of 32bit integer. So this OF flag is designed specifically for 2's complement, because other representation might choose to map other binary numbers to the upper and lower limit.

To conclude: 1. Compiler differentiates signed and unsigned arithmetics by implementing corresponding representations(mapping) and by generating instructions that the result of which comply with compiler's representation of signed and unsigned integer. 2. Compiler implements 2's complement representation and CPU also implement it to support compiler in generating arithmetic instructions that the result of which comply with 2's complement representatiom.

how does c compiler handle unsigned and signed integer? Why the assembly code for unsigned and signed arithmetic operation are the same?

7 Answers7

Comparison and overflow detection

Widening

Widening multiply

bit shifts

Division

Linked

Related