1

From what I can understand, there is an inconsistency with the endianness in MIPS assembly when run on QtSpim (no an x86_64 machine, which means QtSpim is little-endian). However, I am not sure if it's a bug or if I'm wrong.

When a word is loaded into a register, the bytes are not reversed to reflect little-endianness. For example, if a word in memory contains 0x11223344 and we load it in a register, we get 0x11223344 (I would expect 0x44332211).

Consider the following snippet:

    .text
    .globl main
main:
    la   $t0, letters
    lw   $t1, 0($t0)   # Expected ($t1): 0x61626364
    sll  $t2, $t1, 8   # Expected ($t2): 0x62636400
    sw   $t2, 0($t0)   # Expected (mem): 0x00646362
    jr   $ra

    .data
letters:
    .ascii "abcd"

Before the program runs, "abcd" is stored little-endian, as expected: 0x64636261 (dcba). After the program has completed, I would expect 0x00646362 (\0dbc) to be stored in memory, however 0x63626100 (cba\0) is stored.

Why is this the case?

Tested on Fedora 24, x86_64, QtSpim version 9.1.17.

plafer
  • 185
  • 12

1 Answers1

2

There's no inconsistency. You take 0x64636261 and shift it 8 bits to the left (i.e. shifting out bits to the left while shifting in zeroes from the right). So 0x64636261 becomes 0x63626100
If you wanted 0x00646362 you should've used srl instead of sll.

Here's an ASCII diagram showing the relationship between a 32-bit word and the individual bytes in a little-endian configuration:

0x64636261
  | | | |
  | | | ----> |'a'| letters
  | | |       -----
  | | ------> |'b'| letters+1
  | |         -----
  | --------> |'c'| letters+2
  |           -----
  ----------> |'d'| letters+3
Michael
  • 57,169
  • 9
  • 80
  • 125
  • Since this is little-endian, I would expect my register `$t1` to contain `0x61626364` after the load (the bytes in reverse order), and `0x62636400` after the left shift. When we store it back into memory, it is stored in reversed order, which would give `0x00646362`. Am I making any wrong assumptions? – plafer Nov 18 '16 at 15:05
  • _"Am I making any wrong assumptions?"_. Yes. In a little-endian configuration, the least significant byte is located at the lowest address. The first character at `letters` is `'a'` (0x61), then `'b'` (0x62), etc. Hence, `lw` from `letters` will give you `0x64636261`. – Michael Nov 18 '16 at 15:11
  • 2
    @plafer strings are stored in sequential order, not depending on endianness – phuclv Nov 18 '16 at 15:35
  • @Michael _"`lw` from `letters` will give you `0x64636261`"_. Actually, since they are stored `0x64636261`, when I load, I should get `0x61626364` (as confirmed by this question: http://stackoverflow.com/q/8050107/3499862 "_If the machine is Little Endian then will be reversed and then copied to the register._"). The rest of my deduction follows from this fact. I understand (as depicted in the ASCII diagram) how they are laid out in memory, but where we disagree is the order in which they are loaded in the register. Is my confusion clear now? – plafer Nov 18 '16 at 18:45
  • @LưuVĩnhPhúc Actually, since "abcd" is stored as 0x64636261 (dcba), I understand they are stored like any other bit pattern. Little-endian vs Big-endian is independent of the "type" of bit patterns. – plafer Nov 18 '16 at 18:47
  • _" since they are stored 0x64636261, when I load, I should get 0x61626364"_. No. If you've got `0x64636261` stored in memory, then you've got `0x64636261` stored in memory, and that's what you should get back when you load it (as a word). The way I see it you're looking at this from the wrong point of view. A string is an array of characters, with the first character at the lowest addres, then the second character, etc. That's regardless of endianness. And if you load a word from memory, the least significant byte will come from the lowest address (in a LE config). The rest follows from that. – Michael Nov 18 '16 at 19:14