Accessing code segment memory using SW and LW in MIPS

Question

Is it possible to access the code segment memory using the SW and LW instructions in MIPS, given the address of the instructions?

For example:

0x1000: ADDI $s1, $zero, 0x1000
0x1004: LW $s2, 4($s1)

What would the code load into $s2? 0x0000 (given the data segment is empty) or binary representation of the instruction at 0x1004?

EDIT:

AFAIK, pipelining in MIPS processor is possible due to separation of the instructions memory and the data memory - correct me if I'm wrong.

EDIT 2:

I've found a question, the answer to which implies that the instructions can be accessed and modified using LW and SW. Thus the answer is $s2 will contain the binary representation of the instruction at 0x1004.

@old_timer: [segments of the executable file (data / text / bss)](https://stackoverflow.com/questions/14361248/whats-the-difference-of-section-and-segment-in-elf-file-format), not a segmented memory model. But yeah, it appears the OP is confused about program segments vs. x86-style segmented memory (which MIPS doesn't have). — Peter Cordes, Dec 25 '17 at 20:02
@old_timer yes, according to the first google result :): https://www.cs.umd.edu/class/sum2003/cmsc311/Notes/Mips/dataseg.html — Aryéh Radlé, Dec 25 '17 at 20:02
Did you mean to write `ADDI $s1, $zero, 0x1000`? Or simply `LW $s2, 0x1004($zero)`? Otherwise your load address depends on the initial value of `$s1`. — Peter Cordes, Dec 25 '17 at 20:09

Peter Cordes · Accepted Answer · 2017-12-27T19:19:19.333

1

You will load the machine encoding of the instruction at address 0x1004.

MIPS has a flat memory model; different segments of an executable are mapped / loaded into different parts of a single flat memory address space; it's a Von Neumann architecture where code bytes and instruction bytes are the same thing, and share the same address space.

Code addresses use the same address-space as data addresses. Martin's answer suggests it may be possible to create a MIPS where at least the permissions are different, and of course an embedded MIPS with its code in ROM couldn't modify its instructions with stores. But even then code and data would have to be mapped into different parts of the same physical address space, even if stores to code addresses faulted. Possibly you could build a MIPS where even loads of code addresses faulted, but that's unlikely. Jumps to data addresses might also fault if you disabled execute permission on that region / page.

On a normal MIPS with its instruction in RAM, self-modifying code is possible if you have write+exec permissions configured. (But note that for correctness you would usually need to flush i-cache, which the code in that Q&A isn't doing.)

And BTW, .data in the asm source really means the .data section, which the linker eventually links into the data segment of the executable. See What's the difference of section and segment in ELF file format.

The most important point here is that segments of an executable aren't the same thing as x86-style segmented memory. (The terminology has a similar origin, though).

edited Dec 27 '17 at 19:19

answered Dec 25 '17 at 20:22

Peter Cordes

328,167
45
605
847

AFAIK, pipelining in MIPS processor is possible due to separation of the instructions memory and the data memory - correct me if I'm wrong. – Aryéh Radlé Dec 26 '17 at 07:42
@ArieR: You are wrong. If I recall correctly, MIPS doesn't give you any guarantees about what happens if you store to instructions that are about to execute. The old or the new instruction might run. If you want the stored data to be visible as instructions, you have to flush the cache or something before they execute. (ARM works this way: instruction caches and the pipeline aren't coherent with data stores, and there's an `isync` instruction or something). I didn't mention any of this because you were only asking about loads, and shared read-only access as code and data is trivial. – Peter Cordes Dec 26 '17 at 07:53
But x86 does have coherent I-cache. Modern x86 implementations [snoop stores and clear the pipeline if they detect self-modifying code](https://stackoverflow.com/a/18388700/224132), so it's even possible to pipeline a CPU when a store a couple instruction ahead of the program counter has to be visible to code-fetch. It's much easier to pipeline architectures without a coherent code-cache / pipeline, though! (So yes it's an issue, but the solution is just to say "don't do that if you want predictable results".) – Peter Cordes Dec 26 '17 at 07:56
I will accept this answer if you can provide a reference from some online resource - I'm having a hard time finding one – Aryéh Radlé Dec 26 '17 at 08:17
@ArieR: For what? That MIPS is a Von Neumann architecture where code and data share the same address space? You probably don't see that mentioned because almost every architecture is like that; it's only worth mentioning when it's *not* the case. This is necessary to implement a JIT compiler (like many JVMs use): generate machine code as data, then execute it. Even a regular compiler works this way in an OS with a disk cache: the executable code doesn't have to be flushed to disk and reloaded before it can execute. All you need is to sync the code cache + pipeline. – Peter Cordes Dec 26 '17 at 08:26
@ArieR: Found a reference: https://superuser.com/questions/81949/is-the-mips-architecture-more-related-to-harvard-or-von-neumann. – Peter Cordes Dec 26 '17 at 08:34
1

Found a reference my self... https://stackoverflow.com/questions/29262391/self-modifying-mips-code – Aryéh Radlé Dec 26 '17 at 08:42

Martin Rosenau · Answer 2 · 2017-12-25T20:15:26.290

0

Depends on what you mean with "MIPS":

A real MIPS CPU like you find them in some WLAN routers?
Some MIPS emulator like SPIM or MARS?

In the case of a real MIPS CPU it depends how the memory management unit is configured:

If the memory management unit allows read access to the code segment you will indeed get the binary representation of the instruction at address 0x1004.

(By the way: You would need to use addi $s1, $0, 0x1004 to ensure $s1 really contains 0x1004 because $s1 could contain another value than 0.)

If the memory management unit does not allow access to the code segment the program will crash. (Most MIPS CPUs seem not to allow this setting.)

If you use some emulator like SPIM, MARS (or any other one) it depends on how the emulator is working...

Theoretically there could be three types of emulators:

Some which crash
Some which read the binary representation
Some which read some "stupid" value

edited Dec 25 '17 at 20:15

answered Dec 25 '17 at 20:04

Martin Rosenau

17,897
3
19
38

Does MIPS really allow you to set up memory regions / pages with execute but not (data) read permission? Are you also implying that code-fetch might use a different address space than data (read some "stupid" value)? I don't see how anything but reading the instruction word is plausible except with a buggy emulator. – Peter Cordes Dec 25 '17 at 20:08
Thank you for the answer. I would accept it - but I'm missing the default behaviour here. What would you expect as a default in a real world MIPS CPU? – Aryéh Radlé Dec 25 '17 at 20:09
@ArieR: definitely the "default" expectation is that you'd get the machine encoding of the instruction at address `0x1004`. MIPS has a flat memory model; different segments of an executable are mapped / loaded into different parts of a single flat memory address space; it's a Von Neumann architecture where code bytes and instruction bytes are the same thing. And BTW, `.data` in the asm source really means the `.data` section, which the linker eventually links into the data segment of the executable. – Peter Cordes Dec 25 '17 at 20:12
@PeterCordes I just looked up the MIPS R 4400 reference manual because I did not know either. As far as I understand at least the R 4400 does not allow this. However I know some MPUs (not MIPS) where the memory protection unit in a SoC is designed by a different vendor. In this case a 3rd-party MPU might protect code memory from reading. – Martin Rosenau Dec 25 '17 at 20:14
Hmm, with paging if there are separate iTLB and dTLBs, and if they're software-managed (traditional for MIPS), then the OS could maybe implement separate RWX permissions if it wanted to. But without paging, yes you're limited by what the hardware MMU can do. And with paging, you'd still have to figure out whether it was a data or code access. – Peter Cordes Dec 25 '17 at 20:18
@PeterCordes simulator can return "stupid" value if their representation of code segment is not in MIPS opcodes, but pre-processed interpreter data and there's no true "code" memory with original data any more. AFAIK MARS and SPIM support even self-modifying code, although it may need additional setup/options to make it work, so they definitely are capable to emulate code memory in normal way. (simulator = not emulator, I would expect emulator to work with MIPS opcodes only) – Ped7g Dec 25 '17 at 20:49
1

@Ped7g: That would be a bug, IMO. If a simulator wants to cache decoded instructions, it must do it internally, not in guest memory space! I'm not sure there's an agreed-upon specific meaning for simulator vs. emulator. Are you thinking simulator means it might only works at the asm-source level? I guess that's one possible kind of simulator which doesn't fully simulate a MIPS machine. – Peter Cordes Dec 25 '17 at 20:50
@Ped7g: Ah, I like [the top answer here](https://stackoverflow.com/questions/1584617/simulator-or-emulator-what-is-the-difference): an emulator reproduces the externally-visible behaviour, but can do anything internally (e.g. JIT dynamic recompilation). A *simulator* models the actual underlying state of the target. So a "simulator" is a more-accurate / faithful kind of emulator. Another answer on the same question: *An emulator can replace the original for 'real' use. A simulator is a model for analysis.* So yes, it makes sense to call MARS / SPIM simulators, with accurate emulation. – Peter Cordes Dec 25 '17 at 20:59
@PeterCordes I like that definition a lot, but I'm afraid there are cases where it is used in exactly opposite way, like `wine` for example (which I'm never sure how they call it, except "wine" and avoiding common words intentionally). I find MARS a bit inaccurate, as it's impossible to profile with it anything practically (only statistics of instruction ran are there). – Ped7g Dec 25 '17 at 21:04
@Ped7g: Sure, MARS isn't a cycle-accurate simulator; its model of how a MIPS CPU executes instructions doesn't include modeling performance, only architectural state. It's a question of the level of detail of the simulation, like the answer I linked pointed out. Cycle-accurate CPU simulators are rare for modern pipelined CPUs. Some emulators designed for playing old games (old arcade machines, consoles, and even desktops like Atari ST) use them, because old games that target a fixed HW platform often have timing-dependent code so correctness requires cycle-accurate simulation. – Peter Cordes Dec 25 '17 at 21:21
@PeterCordes yes, I'm probably spoiled from ZX Spectrum emulators, where ~30% of the most demanding production didn't work in emulators for quite some time, before finally some people broke down the ULA chip inner details and made the emulators accurate also in iteration/timing with memory access and echoes on bus + RAM refresh, etc... (the timing of CPU emulation was perfect early, but that was not enough, most of the anti piracy and special effects required also perfect timing against memory (differs per addressing area a bit), and memory refresh register). = my expectations level ;) – Ped7g Dec 25 '17 at 21:31

Accessing code segment memory using SW and LW in MIPS

2 Answers2