1

Let us consider an 8-bit processor for simplifying my question. I know that -2 is stored as its 2's complement which is 0b1111_1110 the decimal representation of this chunk of data is 254, right-? Now, my question is how will ARM processors differentiate between a "-2" and a "254" since both their binary representation is same ?

I tried looking up the entire internet everybody keeps explaining about the way how negative numbers are stored by the processors. All I need to know is how they are distinguished.

Kavi
  • 21
  • 1
  • 2
  • You need to know about 2’s compliment that’s how cpu uses to represent signed number. – danglingpointer Sep 13 '19 at 07:46
  • a lot of good answers below. Signed vs unsigned is something the programmer cares about not the processor (ARM is not relevant here each processor has ways to deal with this). Addition and subtraction are not affected they cant tell signed from unsigned, the beauty of twos complement. Multiplication cares because of sign extension. N * N bits the lower N bits of the result are not affected but since it takes 2*N bits to completely store that result the upper N bits of the result differ. So you need a signed and unsigned multiply or solve it another way, which you can. divide is similar. – old_timer Sep 13 '19 at 11:21
  • If you need one of us can show this pencil and paper style, long division, long multiplication, where you can see all of the bits in play. Once you see that then you can apply it to each individual instruction set. ARM, x86, etc. – old_timer Sep 13 '19 at 11:22

5 Answers5

10

-2 is stored as its 2's complement which is 0b1111_1110 the decimal representation of this chunk of data is 254, right-?

Yes, this is true for typical modern system.

Now, my question is how will ARM processors differentiate between a "-2" and a "254" since both their binary representation is same ?

Processor doesn't; the compiler does.

Let's say you have expression value > 0. Variable value and constant 0 both have a types. Depending on these types, compiler chooses what CPU instructions to use. So signed and unsigned comparison could result in different compiler output.

Processor doesn't know about the types in your code. It simply executes these selected instructions.

Example with ARM64 gcc:

int icmp(int num) {
    return num > 0;
}

int ucmp(unsigned int num) {
    return num > 0;
}
icmp:
        sub     sp, sp, #16
        str     w0, [sp, 12]
        ldr     w0, [sp, 12]
        cmp     w0, 0
        cset    w0, gt
        and     w0, w0, 255
        add     sp, sp, 16
        ret
ucmp:
        sub     sp, sp, #16
        str     w0, [sp, 12]
        ldr     w0, [sp, 12]
        cmp     w0, 0
        cset    w0, ne
        and     w0, w0, 255
        add     sp, sp, 16
        ret

See how the compiler generated slightly different cset instruction.

user694733
  • 15,208
  • 2
  • 42
  • 68
5

Most processors, including Arm processors, do not distinguish between signed and unsigned number. A byte containing 0b1111_1110 can be interpreted as an unsigned integer with the value 254 or as a signed integer with the value -2. Or it can be interpreted as something else, such as a floating-point number, a fixed-point number, a character, etc. What determines this interpretation is the operations you make on it.

For many instructions, it doesn't matter whether a value is a signed integer or an unsigned integer: the representation of signed integers is designed to make them lightweight, by working modulo the word size. For example, adding two values of the same size is just an add instruction; it doesn't matter whether the values are signed or not.

For some instructions, the processor provides different instructions. For example, there are two sets of instructions to copy a value to a larger register: SXTB (Sign Extend Byte) and friends, and UXTB (Zero Extend Byte) and friends. The UXT* instructions copy a value to the low-order bits of the target register and set the high-order bits to zero. The SXT* instructions copy a value to the low-order bits of the target register and set the high-order bits to the high-order bit of the value, i.e. they interpret this high-order bit as a sign bit.

From a C perspective, it's the compiler's job to use the correct instructions depending on the operands. For example, if the compiler sees

uint8_t x = 0xfe;
uint32_t y = x + 3;

and it decides that the best way to compile that is to store x in the low-order bits of a 32-bit register and y as another 32-bit register, it will emit a UXTB instruction to set the register for y to 0x000000fe then an ADD instruction to get the desired value of x. (Of course in practice this snippet will be optimized away.)

Gilles 'SO- stop being evil'
  • 104,111
  • 38
  • 209
  • 254
3

2's complement representation has the nice property that addition and subtraction doesn't have to care whether the number is signed or unsigned as long as there is no overflow. 0b11111110+0b00000001 gives 0b11111111, which is -2 + 1 = -1 when interpreted as a signed value, or 254 + 1 = 255 when interpreted as unsigned.

When signedness does matter, there are different machine code instructions for signed and unsigned, like SMULL and UMULL for signed and unsigned multiplication. Comparing values works by using different condition code suffixes, i.e. checking another set of flags for signed and unsigned types in the instructions folloving the comparison, e.g. BLE for signed <= and BLS for unsigned <=.

3

A CPU doesn't know the data type of something stored at a particular memory location - data types is something that only exists in programming languages. It is up to the compiler and/or the programmer to keep track of which type that is stored at a certain location. When C programming, the compiler does this job for you, most of the time.

When we say that the "CPU is 2's complement", we refer to the behavior of the signed arithmetic instructions. That is, when you run a CPU instruction that executes 0 - 1, the instruction will result in a binary number 1111 .... 1111b, as well as the appropriate flag getting set in a condition code register, indicating a negative result.

The programmer can either use or chose to ignore the "negative flag" - in which case the machine code has executed a well-defined underflow. The concept of well-defined overflow/underflow of signed numbers does unfortunately not exist in C programming, however. So if we produce overflow/underflow in C, the compiler might generate incorrect code. Which will never happen if we do the same in assembler, since the behavior is well-defined on the CPU level.

Lundin
  • 195,001
  • 40
  • 254
  • 396
-1

This depends on how a variable is declared.

For an 8 bit variable you have the standard types uint8_t and sint8_t for a signed or unsigned respectively.

e.g.

#include <stdint.h>

uint8_t a = 254;
sint8_t b = -2;

If you are experimenting with this, you should also study the Integer promotion rules

Rishikesh Raje
  • 8,556
  • 2
  • 16
  • 31