Confusion about MIPS I-type instruction sign extend

Question

I am learning the MIPS instructions, and when I test the I-type instrtuctions which need to sign extend the immediate, I am confused abou the following outcomes (All of them are run in MARS):

Say we have the source code line ori $s1, $s2, 0xfd10, MARS gives the basic assembler instruction ori $17, $18, 0x0000fd10. This is the expectation since ori should zero extend the 16-bit immediate. If we only change the funct ori to andi, that is the source code line andi $s1, $s2, 0xfd10, MARS gives the almost same basic assembler instruction andi $17, $18, 0x0000fd10. However, unlike ori, andi should use sign extending. So the basic assembler instruction is supposed to be andi $17, $18, 0xfffffd10.

andi also should use zero-extend! Please ignore the first question.

When I try to use slti rt, rs, imm, for example, slti $s1, $s2, 0x8000, MARS refused to execute the line and the error message is "0x8000": operand is out of range. I see no reason that the immediate is out of range. If I change the immediate down a bit, say, slti $s1, $s2, 0x7fff, it worked and the immediate is extended to 0x00007fff. My expectation is that 0x8000 should be extended to 0xffff8000. Is there anything wrong about my understanding?

#1 seems to be a bug. The reason for #2 is to avoid confusion. You wrote `0x8000` hence you presumably wanted `0x8000` and not `0xffff8000`. — Jester, Dec 09 '20 at 17:32

Peter Cordes · Accepted Answer · 2020-12-09T18:31:50.137

Values in asm source represent the actual number values you want to work with, not just the bit-patterns to be encoded into the instruction.

0x8000 is not the same number as 0xffff8000 so the assembler stops you from having your value munged by sign-extension. If you wanted the final instruction's machine code to encode the value 0xffff8000, you should write 0xffff8000 in the asm source for instructions that sign-extend their immediates.

In our place value writing-system for numbers, there are an infinite number of implicit high 0 digits to the left of the explicit digits. So 0x8000 is the same number as 0x00008000, and that's the number that the assembler is trying to represent as a 16-bit sign-extended immediate.

You're approaching this from the PoV of how I-type instructions are encoded. But assemblers are designed to handle the encoding details for you. That's part of the point of using one. Say you write addiu $t0, $t1, -123 and the assembler encodes -123 as a 16-bit sign-extended immediate.

Say you write ori $t0, $t0, -256 to set all the bits above the low byte. But the assembler rejects that because it's not encodeable as a zero-extended immediate for ori, instead of silently leaving the upper 16 bits unset, like 0x0000ff00. So you don't have to memorize how each instruction treats its immediate; the assembler checks that for you. This is an intentional feature and a good design.

Especially if you had a large program that defines some assemble-time constants and then uses them various ways: if tweaking one of those values resulted in an instruction not being encodeable, you'd want to know about it instead of having silently wrong results.

(And since I used decimal examples, writing numbers as hex numeric literals doesn't change anything about how the assembler should treat them.)

However, unlike ori, andi should use sign extending.

No, in MIPS all 3 bitwise boolean logical instructions (ori/andi/xori) zero-extend their immediates. (Sign-extend would have been more useful in more cases for AND, allowing masks with only a few zeros in the low bits, but that that's not how MIPS is designed. Although that would make truncation to exactly 16 bits more expensive.)

Documentation like https://ablconnect.harvard.edu/files/ablconnect/files/mips_instruction_set.pdf confirms andi zero-extends. I didn't check official MIPS docs, but this info is widespread on the Internet; you could also test to see compilers use it that way to implement uint16_t or whatever.

Also andi vs. addi instruction in MIPS with negative immediate constant (covers MARS with extended pseudo-instructions enabled, so it will construct a full 32-bit value in another register if you use andi with a value that's not encodeable as a 16-bit zero-extended immediate)

Thanks a lot. For the `ori`, I was misled by a lecture note with a typo. I checked various mips sheets, like https://inst.eecs.berkeley.edu//~cs61c/fa11/MIPS_Green_Sheet.pdf, and confirmed that `ori` also uses zero-extend. I still do not really understand the part of `slti`. `slti $s1, $s2, 0xffff8000` is also refused by MARS. And what kind of negative number will be sign extended by `slti`? If I use `sltiu $s1, $s2, 0x8000`, the erro is the same. I searched a lot, but fail to find an explanantion on the process of sign extending a number like `0x8000` for `slti` or `sltiu`. — codekiwi, Dec 09 '20 at 18:00
@codekiwi: I just tried in MARS 4.5 and `slti $s1, $s2, 0xffff8000` assembles as expected, to `0x2a518000 slti $17, $18, 0xffff8000`. It works because `0xffff8000` is representable as a 16-bit sign-extended integer. — Peter Cordes, Dec 09 '20 at 18:08
Thank you. I reopened my MARS 4.5, and `slti $s1, $s2, 0xffff8000` worked. I understand `0xffff8000` is representable as a 16-bit sign-extended integer. My question is, since `0x1000` can be automatically extended to `0x00001000` by the assembler, why `0x8000` cannot? Why the `0xffff8000` has to be entered in the source explicitly by manual input? Sorry if this is a dumb question. — codekiwi, Dec 09 '20 at 18:21
Because 0x1000 and 0x00001000 are the same number, whereas 0x8000 and 0xffff8000 are not. — prl, Dec 09 '20 at 18:23
@codekiwi: When I write `0xffff` as a numeric literal constant in C, assembly, or whatever, the implicit higher bits are assumed to be 0. i.e. it's the same number as `0x0000ffff`, not `-1`. You're writing the value you want the instruction to use, not the *encoding* you want for the MIPS I-type instruction. That's what the first part of my answer is trying to explain. — Peter Cordes, Dec 09 '20 at 18:24
Thank you very much, Peter. I compare `slti` with `lw`, `addi`, and I begin to understand the "the value you want the instruction to use" and "the encoding you want for the MIPS I-type instruction". These data sheet simply says some instructions need sign-extend and provides no explanation about the process done by the assembler. While the textbook explained what is sign extension but does not combine it with the practice. Thank you for your patience helping me out. — codekiwi, Dec 09 '20 at 18:55
@codekiwi: The data sheet of course is documenting how the CPU processes the machine-code, nothing more. Understanding the design philosophy of typical assemblers' source language is a separate and totally different thing, and yeah isn't usually spelled out by documentation. The fact that numbers are numbers is kind of implicit. Different assemblers can make different choices, e.g. whether to warn and truncate, or error, when a number doesn't fit in an immediate. — Peter Cordes, Dec 09 '20 at 19:01

Confusion about MIPS I-type instruction sign extend

1 Answers1

Linked

Related