13

I always find this confusing when I am looking at the disassembly of code written in C/C++.

There is a register with some value. I want to know if it represents a signed number or an unsigned number. How can I find this out?

My understanding is that if it's a signed integer, the MSB will be set if it is negative and not set if it is positive. If I find that it's an unsigned integer, the MSB doesn't matter. Is this correct?

Regardless, this doesn't seem to help: I still need to identify if the integer is signed before I can use this informatin. How can this be done?

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
user1466594
  • 203
  • 1
  • 2
  • 9
  • 1
    What language? Just 'assembly' is not very specific. – harold Jun 26 '12 at 10:41
  • 2
    If this number is used in conditionals, you can look at which jumps are taken afterwards. For example, `ja` and `jb` are for unsigned values, `jg` and `jl` are for signed ones. Other instructions are also telling, for example `movsx` is for signed, `movzx` is for unsigned (though it's for target type, not the source type - consider the C/C++ promotion rules and explicit casting). – DCoder Jun 26 '12 at 10:43
  • 3
    You cannot see that from the values alone. They are just bits, the interpretation of the bits decide what they represent. – Bo Persson Jun 26 '12 at 10:44
  • 1
    harold: assembly only, DCoder: thanks..I read the exact same thing before.., Bo Persson: thanks.. – user1466594 Jun 26 '12 at 10:48
  • 1
    @user1466594 what processor. x86 apparently. – harold Jun 26 '12 at 10:48
  • 1
    You may also look at divisions, since they care about signedness. And double-width multiplications maybe, but they're quite rare. – harold Jun 26 '12 at 10:54

3 Answers3

10

Your best bet is too look for comparisons and associated actions/flag usage like a branch. Depending on the type the compiler will generate different code. As most (relevant) architectures provide flags to deal with signed values. Taking x86 for example:

jg, jge, jl, jle = branch based on a signed comparison (They check for the SF flag)
ja, jae, jb, jbe = branch based on a unsigned comparison (They check for the CF flag)

Most instructions on a CPU will be the same for signed/unsigned operations, because we're using a Two's-Complement representation these days. But there are exceptions.

Lets take right shifting as an example. With unsigned values on X86 you would use SHR, to shift something to the right. This will add zeros on on every "newly created bit" on the left.

But for signed values usually SAR will be used, because it will extend the MSB into all new bits. Thats called "sign extension" and again only works because we're using Two's-Complement.

Last but not least there are different instructions for signed/unsigned multiplication/division.

idiv or one-operand imul = signed
div or mul/mulx = unsigned

As noted in the comments, imul with 2 or 3 operands doesn't imply anything, because like addition, non-widening multiply is the same for signed and unsigned. Only imul exists in a form that doesn't waste time writing a high-half result, so compilers (and humans) use imul regardless of signedness, except when they specifically want a high-half result, e.g. to optimize uint64_t = u32 * (uint64_t)u32. The only difference will be in the flags being set, which are rarely looked at, especially by compiler-generated code.

Also the NEG instruction will usually only be used on signed values, because it's a two's complement negation. (If used as part of an abs(), the result may be considered unsigned to avoid overflow on INT_MIN.)

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Nico Erfurth
  • 3,362
  • 22
  • 26
  • 3
    `imul` (the 2+ operand version) might as well be unsigned. And `je` (aka `jz`) is perfectly valid for unsigned comparisons (and non-comparisons, even). – harold Jun 26 '12 at 12:32
  • 1
    Sorry, I forget the "l" in jle. For 2+op imul you're right, it can be used for unsigned multiplication, i think the only difference is in setting the flags, i'll rework the answer. – Nico Erfurth Jun 26 '12 at 13:14
  • 1
    `mul` and `imul` can be both used for multiplication of signed and unsigned integers when only the least significant half of the product is needed. – Alexey Frunze Jun 26 '12 at 19:44
  • 1
    Please also correct the lists of conditional jumps. Unsigned: `J(N)A(E)`, `J(N)B(E)`, `J(N)C`. Signed: `J(N)G(E)`, `J(N)L(E)`. – Alexey Frunze Jun 26 '12 at 19:47
5

In general, you won't be able to. Many things that happen to integral values happen the same way for signed or unsigned values. Assignment, for example. The only way to tell is if the code happens to be doing arithmetic. You absolutely can't tell by looking at the value; all possible bit patterns are valid either way.

bmargulies
  • 97,814
  • 39
  • 186
  • 310
4

In most processors (at least those that use two's complement math), there is no inherent sign-ness for the integers stored in registers or memory. The interpretation depends on the instructions used. A short summary:

  1. Addition and subtraction produce exactly the same bit patterns for signed and unsigned numbers, so usually there is no signed addition or subtraction. (Hovewer, MIPS has separate instructions which cause a trap if the operation overflows).

  2. Division and multiplication does produce different results for signed vs. unsigned numbers, so if the processor supports it, they come in pairs (x86: mul/imul, div/idiv).

  3. conditional branches also may differ depending on the interpretation of the comparison result (usually implemented as subtraction). For example, on x86 there is jg for signed greater and ja for unsigned above.

Note that the floating-point numbers (at lease IEEE format) use an explicit sign bit, so the above does not apply to them.

Igor Skochinsky
  • 24,629
  • 2
  • 72
  • 109
  • 5
    Multiplication actually produces the same result for signed and unsigned in the low half, which is usually the only half that's used. – harold Jun 26 '12 at 14:22