QtSpim treating `.asciiz` as a sequence of word?

Question

I have the following MIPS program that simply prints hello world:

.data
    .word 12345
    .word 0x1234ffff
    message: .asciiz "Hello World! \n"
.text
main:
    li $v0, 4
    la $a0, message
    syscall

    li $v0, 10
    syscall

Upon inspecting QtSpim's data tab, it shows the following:

[10000000]..[1000ffff]  00000000
[10010000]    00003039  1234ffff  6c6c6548  6f57206f    9 0 . . . . 4 . H e l l o   W o 
[10010010]    21646c72  00000a20  00000000  00000000    r l d !   . . . . . . . . . . . 
[10010020]..[1003ffff]  00000000

I can see that 12345 and 0x1234ffff has gone into address 0x10010000 and 0x10010004 respectively in big-endian format, but what confuses me is the ascii string representation I tried to interpret:

10010008: 6c        ‘l’
10010009: 6c        ‘l’
1001000a: 65        ‘e’
1001000b: 48        ‘H’

1001000c: 6f        ‘o’
1001000d: 57        ‘W’
1001000e: 20        ‘ ‘
1001000f: 6f        ‘o’

10010010: 21        ‘!’
10010011: 64        ‘d’
10010012: 6c        ‘l’
10010013: 72        ‘r’

10010014: 00        empty for word-alignment?
10010015: 00        ‘\0’
10010016: 0a        ‘\n’
10010017: 20        ‘ ‘

So it seems like QtSpim have split up the string into words and stored them in little-endian order.

This doesn't make any sense to me since the first two word data were stored in big-endian format, but for strings it suddenly decides to take this weird approach? I especially can't understand why it splits the string into words and then stores them in little-endian order.

That's how the debugger always shows memory, in chunks of 4 bytes at a time, as 32-bit values, showing the *value* represented (what you'd get if you did an `lw` from there), not the bytes separately, so you can't infer endianness from that. QtSPIM emulates a MIPS with the host machine's native endianness, so yes, it's totally normal that hex printing order isn't memory byte order. You can try an endian test with some MIPS code, like `li $t0, 0x12345678` / `sw $t0, ($sp)` / `lbu $t1, ($sp)` and see that you get `0x78`, unless you built and RAM QtSPIM on a big-endian machine like some PowerPC. — Peter Cordes, Nov 18 '22 at 09:35
Possible duplicate of [Little-endian MIPS inconsistency](https://stackoverflow.com/q/40667927), except I think they have a different misconception. — Peter Cordes, Nov 18 '22 at 09:38
Thanks for the explanation! So if I understood your comment correctly, the memory viewer just displays the results of effectively doing lw for each n*4 address. — Dashadower, Nov 18 '22 at 09:52
Yes, precisely. And printing that value in normal printing order, most-significant digit first, not lowest-address first. That why it's printed as a single 8-digit hex number, not 4 pairs. [MARS MIPS Simulator ASCII string not storing in memory in little-endian properly?](https://stackoverflow.com/q/66609885) looks like a duplicate, but I'm not sure my answer there makes that point clearly. — Peter Cordes, Nov 18 '22 at 09:53
A side question, afaik endianness only impact the *storage order* of words. So regardless of whatever endian-system I'm on, lw would consistently read 4 bytes as [addr + 0],[addr + 1], [addr + 2], [addr + 3]. If then does that mean the assembler takes into account the endian of the system and adjusts how data is written? — Dashadower, Nov 18 '22 at 09:54
Yes for `.word`, no for `.ascii`/`.asciiz` - strings are a byte-stream that goes into memory in source order. (Except actually no for QtSPIM; it just uses the host machine's native endianness, so it probably just uses parses a string of digits into a C `uint32_t` variable, and stores that into memory in host endianness, with `memcpy` or maybe even a pointer cast. And same for its loads when emulating execution of an `lw` or `sw`. On a little-endian host, that creates the effect of simulating a little-endian MIPS.) — Peter Cordes, Nov 18 '22 at 09:56
Also related: **[Is mars MIPS simulator Big or Little Endian](https://stackoverflow.com/q/46533430) has lots of details** - MARS is written in Java, which guarantees always little-endian, so it behaves the same as QtSPIM compiled for a little-endian machine like x86 or ARM64. — Peter Cordes, Nov 18 '22 at 09:59
@PeterCordes: The fact that MARS is written in Java does not by itself guarantee the endianness of the simulated CPU. You can certainly write a simulator in Java for a big-endian architecture — gusbro, Nov 18 '22 at 17:26
@gusbro: I meant that Java's native byte-order is little-endian, even when a JVM is running on big-endian hardware, translating when doing anything that would expose the native endianness, like byte access to wider primitive types if that's even possible. Hmm, now I'm not sure how much Java even allows in terms of an equivalent to memcpy from a char array to a `uint32_t`. Apparently it does have a ByteOrder class to let you know which will be more efficient for allocating a "direct buffer". — Peter Cordes, Nov 18 '22 at 17:40
@gusbro: Anyway, what I was trying to say was that MARS is always little-endian, not host-sensitive because it's written in Java. I see there was room for ambiguity. — Peter Cordes, Nov 18 '22 at 17:41

QtSpim treating `.asciiz` as a sequence of word?

0 Answers0