-1

I am learning assembly language and got stuck on this point. This is a problem from book "Computer System" chapter 3. The problem description is:

1st part of the problem

2nd part of the problem

Look at questions A, B and C.

A.

cmpl %eax, %edx
setl %al

Solution: The suffix ‘l’ and the register identifiers indicate 32-bit operands, while the comparison is for a two’s complement ‘<’. We can infer that data_t must be int.

B.

cmpw %ax, %dx
setge %al

Solution: The suffix ‘w’ and the register identifiers indicate 16-bit operands, while the comparison is for a two’s-complement ‘>=’. We can infer that data_t must be short.

C.

cmpb %al, %dl
setb %al

Solution: The suffix ‘b’ and the register identifiers indicate 8-bit operands, while the comparison is for an unsigned ‘<’. We can infer that data_t must be unsigned char.

My question is how to determine "comparison is for a two’s complement ‘<’", "comparison is for a two’s-complement ‘>=’" and "comparison is for an unsigned ‘<’". Also, I cannot understand how to determine data type from this.

General Grievance
  • 4,555
  • 31
  • 31
  • 45
  • 1
    Each Solution seems pretty clear: you can tell the size from the last char of the `cmp?` operator and registers being used, and the type of comparison (and thus `signed` or `unsigned`) from the `set?` operator (see https://stackoverflow.com/q/44630262/535275). – Scott Hunter Feb 02 '22 at 19:36
  • Read https://c9x.me/x86/html/file_module_x86_id_288.html. Specifically in reference to "above" and "below." – General Grievance Feb 02 '22 at 19:36
  • @ScottHunter Can you please say how the solution says (determines) "Comparison is for a two's complement '>=' ". I can't understand this. – Md. Masud Mazumder Feb 02 '22 at 19:42
  • It's simply that [the condition codes](https://pushbx.org/ecm/doc/insref.htm#iref-cc) for `jcc`, `cmovcc`, and `setcc` are defined as "less or equal", "less", "greater or equal", "greater", respectively "below or equal", "below", "above or equal", "above". The signed comparison conditions are customarily called "L" and "G", the unsigned comparison conditions are instead called "B" and "A". Also, the `setl` is a form of [`setcc`](https://pushbx.org/ecm/doc/insref.htm#insSETcc); that "L" is the condition code, not a size specifier like in `cmpl`. – ecm Feb 02 '22 at 19:49
  • 2
    Note that `cmpl %eax, %edx` \ `setl %al` is indeed a comparison "for a two’s complement ‘<’ (less than)" but due to [the order of operands](https://stackoverflow.com/questions/2397528/mov-src-dest-or-mov-dest-src/60596999#60596999) in your AT&T syntax, `al` is actually set to 1 (true) if `edx` **is less than** `eax`. You have to swap the order of `cmp` operands into Intel order for the comparison condition to be applicable. – ecm Feb 02 '22 at 19:55

2 Answers2

2

The first part (the data type) is straight-forward. eax is a 32-Bit-register, so the data type is int (or more precisely int32_t). Similarly, ax is a 16 bit register and al an 8 bit register.

For the second part, you need to know the instructions. The Intel specification says (under the setxx command):

The terms “above” and “below” are associated with the CF flag and refer to the relationship between two unsigned integer values. The terms “greater” and “less” are associated with the SF and OF flags and refer to the relationship between two signed integer values.

So setb operates on unsigned values, while setl and setge operate on signed values. "two’s complement" here means the same as "signed".

PMF
  • 14,535
  • 3
  • 23
  • 49
  • Thanks for your answer. I need one more clarification to be more clear. As far I know setl, setb etc just checks the current condition of flag bits which is changed or modified by cmp instruction. Then why set operations to operate on signed or unsigned values? – Md. Masud Mazumder Feb 02 '22 at 19:50
  • 1
    The set commands set the target register to 1 (true) if the condition is true. This is determined by looking at the flags register. The CPU looks at different registers, depending on which operation is to be performed. For unsigned comparisons, the ZF and CF flags are tested, for signed comparisons the ZF, SF and OF flags are tested. The result is not the same when at least one of the values has the top bit (the sign bit) set. – PMF Feb 02 '22 at 20:03
0

For integers, the notion of data type goes to both width and signed-ness.

Data type is for variables and can be signed or unsigned.  Variables can hold values, and, integer values can be negative of positive.

There are some 4 standard widths in assembly, byte, word, long, quad, each twice as large as the prior.  These do not necessarily indicate whether signed or unsigned.  In the x86 world, word is 16-bits, whereas in most other environments (MIPS/RISC V) word refers to 32-bits.  Further, long is sometimes called dword for double word, qword for 8 byte values.

There are some 4 standard widths in C, char, short, int, long, but in C they are generally understood as signed — except char has implementation specific signed'ness.  C guarantees that sizeof(char) < sizeof(short) <= sizeof(int) <= sizeof(long), but to know exactly which is what you must consult the implementation's documentation — an implementation is supposed to tell you.  Many implementations have int and long both as 32-bits, but sometimes there are compiler options to change that, and, long long is usually 64-bits.

In C, we can add keyword signed or unsigned to ensure the data type is one that can hold negative values or cannot hold negative values, respectively.

For comparison operations broadly across both signed and unsigned data types, there are a total of 10 usual relations.  Let's note now that programming languages and instruction sets omit relational operations where one operand is signed and the other is unsigned (and vice versa).  If you have such a situation, the best approach is to promote both operands the next higher signed size and do the comparison that way.  So, included in the standard 10 relations are unsigned to unsigned and signed to signed comparisons (but no signed to unsigned and no unsigned to signed).

Two of them equal (==,eq) and not equal (!=,ne), apply the same to signed and unsigned data types both — to be equal or different, the bit pattern must be identical and signed'ness doesn't matter there (given that both operands are either signed or both are unsigned).

For the rest, we must know the signed'ness of the data type to interpret results properly.  A negative number, if accidentally viewed as unsigned, looks like a large positive number.  So, if we use the wrong comparison operator, then -1 will appear as maxint and be larger than 1.  That's why we must know the data type.  We can infer whether the data type is signed or unsigned from the comparison operator.

The industry has generally settled on terminology:

  • above & below for unsigned > and unsigned <
  • above or same & below or same for unsigned >= and unsigned <= (68000)
  • above or equal & below or equal for unsigned >= and unsigned <= (Intel)
  • less than unsigned (ltu) for unsigned < (MIPS/RISC V)
  • less than & greater than for signed < and signed >
  • less than or equal & greater than or equal for signed <= and signed >=

Let's also add that C (and other high level languages) use logical variable declarations to tag variables with data types, and with that the compiler generated machine code accesses the same variable's physical storage consistently as that data type, whenever the program uses variables.

Whereas in machine code, there are no variable declarations that the processor sees or knows about, and so, some data type information must be conveyed with every instruction that manipulates storage as needed.  To copy data the processor only needs to know size, not signed'ness, same for comparison by equal/not-equal, but for other operations (other inequalities like <, <=, or to detect overflow) the processor must be informed of the data type's size and signed'ness.

There's at least two reasons that processors don't read variable declarations, and one is that it would be too much for them to remember, or to put it another way, we have another way of remembering, which is incorporating that information into the machine code of program, which means that the program really knows, and tells the processor at every instruction.

The other is that the physical storage of the processor: CPU registers and memory, are frequently being repurposed.  The CPU registers are permanent, but logical variables of high level languages can be ephemeral — especially parameters and local variables.  Logical variables have scope and when the scope exits, the variable disappears, leaving the physical storage free to be reused for another purpose, which the assembly program does by simply initializing that physical storage with a new value.  Thus, one moment the same register may hold an unsigned byte and another moment a signed integer.  The machine code program's job is to keep that straight, and the compiler does it in part through type declarations in the source code it is translating.


Conditional branching is somewhat complex, as follows.  In C, we might do something like

if ( a < b ) goto Label;

This relatively simple operation essentially has 4 operands, more than most processors accommodate in one instruction.  The 4 operands are variable 1, variable 2, the specific relational operator, and the goto-target label.

So, one approach used by instruction set designers is to split the 4 operands and spread them out into 2 separate instructions, like compare and branch or set.  The compare operation takes the variable 1 and variable 2 and does all the 10 comparison operations simultaneously, putting all 10 results into the flags register.  The branch instructions take the specific relational operator and the goto-target label — they interpret the flags given the relational operator to see if it should branch or not.

The setxx instructions parallel the branch instructions, but have a register target (a boolean) rather than a goto/branch target.

Erik Eidt
  • 23,049
  • 2
  • 29
  • 53