One use case is the EICAR test file which is a legitimate DOS executable COM file for testing antivirus programs.
X5O!P%@AP[4\PZX54(P^)7CC)7}$EICAR-STANDARD-ANTIVIRUS-TEST-FILE!$H+H*
It has to use self code modification because the executable file must contain only printable/typeable ASCII characters in the range [21h-60h, 7Bh-7Dh] which limits the number of encodable instructions significantly
The details are explained here
It's also used for floating-point operation dispatching in DOS
Some compilers will emit CD xx
with xx ranging from 0x34-0x3B in places of x87 floating-point instructions. Since CD
is the opcode for int
instruction, it'll jump into the interrupt 34h-3Bh and emulate that instruction in software if the x87 coprocessor is not available. Otherwise the interrupt handler will replace those 2 bytes with 9B Dx
so that later executions will be handled directly by x87 without emulation.
What is the protocol for x87 floating point emulation in MS-DOS?
Another usage is to optimize code during runtime
For example on an architecture without variable bit shifts (or when they're very slow) then they can be emulated using only constant shifts when the shift count is known far in advance by changing the immediate field containing the shift count in the instruction before control reaches that instruction and before the cache is loaded for running
It can also be used to change function calls to the most optimized version when there are multiple versions for different (micro-)architectures. For example you have the same function written in scalar, SSE2, AVX, AVX-512... and depending on the current CPU you'll choose the best one. It can be done easily using function pointers which are set at startup by the code dispatcher, but then you have one more level of indirection which is bad for the CPU. Some compilers support function multiversioning which automatically compiles to different versions, then at load time the linker will fix the function addresses to the desired ones. But what if you don't have compiler and linker support, and you don't want the indirection either? Just modify the call instructions yourself at startup instead of changing the function pointers. Now the calls are all static and can be predicted correctly by the CPU