Quoting from Intel's Optimization Manual (248966-042b september 2019):
Assembly/Compiler Coding Rule 11. (M impact, H generality) When executing code from the Decoded Icache, direct branches that are mostly taken should have all their instruction bytes in a 64B cache line and nearer the end of that cache line. Their targets should be at or near the beginning of a 64B cache line.
Assembly/Compiler Coding Rule 47. (H impact, M generality) Try to arrange data structures such that they permit sequential access. (*)
Assembly/Compiler Coding Rule 51. (H impact, L generality) Always put code and data on separate pages. Avoid self-modifying code wherever possible. If code is to be modified, try to do it all at once and make sure the code that performs the modifications and the code being modified are on separate 4-KByte pages or on separate aligned 1-KByte subpages.
(*) An earlier version of the same manual had:
Assembly/Compiler Coding Rule 45. (H impact, H generality) Align data on natural operand size address boundaries. ... A 64-byte or greater data structure or array should be aligned so that its base address is a multiple of 64.
The easy way to follow these rules would be to use the ALIGN 64
and ALIGN 4096
(or ALIGN 1024
) commands in the source text. However since MS-DOS only loads an application on a 16-byte aligned memory address, there's little chance that the above ALIGN
's will actually do the appropriate alignment as required for the optimal performance.
I wrote next program to make sure that any application will be loaded by DOS at the start of a 4-KByte memory page. Thereafter all possible ALIGN
's in the application will perform as expected, which is aligning to physical memory boundaries.
; ALIGN.COM (c) 2020 Sep Roland
; This tool creates a memory block in order to achieve that any subsequent
; program will be loaded by DOS at the desired physical memory alignment.
ORG 256
ALIGNMENT equ 4096 ; 16, 32, 64, ...
mov dx, (ALIGNMENT/16)-1
mov ax, cs
and ax, dx
jz NOP ; Alignment OK
sub dx, ax ; Filler paragraphs (MCB excluded)
jnz .b
.a: add dx, (ALIGNMENT/16)
.b: cmp dx, 6 ; (a)
jb .a
TSR:
mov es, [002Ch] ; (b)
mov ah, 49h ; DOS.ReleaseMemory
int 21h
mov ax, 3100h ; DOS.TerminateAndStayResident
int 21h
NOP:
ret
END
(a) DOS (version 6.20) refuses a TSR with fewer than 6 paragraphs. Important PSP data.
(b) Releasing the environment is necessary to avoid that allocating a new environment for the next task messes up the established alignment. And of course the left behind paragraphs are just (unused) data - not code that could be needing an environment string.
Are there any IDE's, shells, or DOS clones in existence that do this kind of base alignment before loading the program to run?
When the programstart is merely 16-byte aligned, what are alternative ways to make code- and data alignment right vis-à-vis the CPU?