0

I'm new in assembler and trying to figure out this code:

072A:100 mov word ptr [0107], 4567
072A:106 mov ax, 1234
072A:109 add ax, dx

Thing that I understand is that first instruction puts two bytes with values 67 45 at address 072A:107. In the end AX = 4567.

What I don't understand is why newer instruction mov ax, 1234 doesn't change value at address 072A:107 of previous mov word ptr [0107] instruction, why dump isn't changed?.

Thank you in advance.

Pooshkis
  • 43
  • 1
  • 8
  • What is your question exactly? Why `mov ax, 1234` isn't shown as `mov ax, 4567` instead..? Have you tried executing the code once and then generating the disassembly again? – Michael Nov 08 '18 at 09:45
  • mind the segments. 'word ptr [107]' isn't necessarily CS:[107] – Tommylee2k Nov 08 '18 at 10:00
  • 1
    Self-modifying code has stopped being practical quite a long time ago. Modern processors prefetch and pre-decode instructions well before they are executed. As-is, this code requires a special instruction between the two, a serializing instruction like `cpuid`. – Hans Passant Nov 08 '18 at 10:25
  • 1
    @Pooshkis how about rephrasing the title of question? Something like "why later instruction, modified by previous one, does not reset back when executed"? Your current one seems more like you are asking why `mov ax,1234` does not modify previous instruction, and that's hopefully clear, it doesn't write any memory, so it can't modify any instruction at all. Or did you have something else on mind and the proposed title is not telling it? – Ped7g Nov 08 '18 at 10:43
  • @HansPassant: Fun fact: actual hardware implementations of x86 have stronger i-cache coherency than what's required on paper, because being exactly as weak as the paper spec would be slow, as Andy Glew explains [Observing stale instruction fetching on x86 with self-modifying code](https://stackoverflow.com/a/18388700). I think the most that any x86 has ever required is a taken jump to avoid stale instruction fetch, but modern OoO-exec machines snoop addresses that are already in the pipeline. (Resulting in massively slow machine-clears for self-modifying code.) – Peter Cordes Nov 08 '18 at 11:15
  • Thank you all for clearing out this issue for me – Pooshkis Nov 09 '18 at 11:08

1 Answers1

2

When you are looking at that disassembly (before executing first instruction), the memory is already loaded with the machine code (I will assume this is DOS COM file, so cs=ds=ss=0x72A and the first mov will self-modify the second mov).

So the content of memory is already (the middle part is machine code bytes in hexa):

072A:100 C70607016745   (mov word ptr [0107], 4567) <- cs:ip points here
072A:106 B83412         (mov ax, 1234)
072A:109 01D0           (add ax, dx)

After executing first instruction (C7 06 07 01 67 45 - 6 bytes are read by CPU and decoded as mov [..],.. instruction) the memory content will change to:

072A:100 C70607016745   (mov word ptr [0107], 4567)
072A:106 B86745         (mov ax, 4567)  <- cs:ip points here
072A:109 01D0           (add ax, dx)

If you will disassemble the machine code now, you will see the second instruction as "mov ax, 4567" already... the CPU has no idea, that the original source did say mov ax, 1234 and as you can see from the machine code in memory, there's no way to reconstruct that, there's no 1234h value anywhere in memory.

Also when you reload the code from executable, it will be again mov ax, 1234, because that's what is stored in the binary after assembling step, before executing it.

The machine code is not built at runtime from source, the assembler does produce binary machine code during assembling time, so there's nothing to "restore" that second instruction back to mov ax,1234 (source and assembler are not relevant at runtime).

If this would be some kind of interpreted language, preparing every instruction just before execution, assembling from source, then the first instruction would have to modify source to cause self-modification at "interpretation-time", but most of the interpreters don't allow any easy way to modify currently interpreted source.

And even toy/simulator-machines designed to teach assembly (MARS/SPIM, or 8-bit assembler simulator) operate at "runtime" with binary machine code, not source code (although they may or may not allow self-modification to propagate into simulation, some simulators may ignore it and protect original machine code from modification for whatever weird reasons).

warning for assembly newcomers: while self-modification of code may sound cool at first (at least it did to me), it's strongly discouraged: 1) you can NOT use it by default in modern SW (unless you go quite some lengths to enable it) 2) it hurts performance of modern CPUs a lot, because when modern x86 CPU detects write at 107h, it did already fetched+decoded+speculatively executed several instructions down the line, so it has to throw all of that "future" work into trash, clear the internal caches, and start over, which means that instruction like mov ax,1234 which may have been executed in single cycle or even along some other instruction, may instead take 100+ cycles. 3) it allows for difficult to find bugs, if you are not experienced enough to guess all implications of such code.

So it's valuable to understand the concept and what happens, but don't use it unless you are doing something extra niche/specialized, like 256B intro and it saves you two bytes, then it's valid.

Ped7g
  • 16,236
  • 3
  • 26
  • 63