1

Quoting from Intel's Optimization Manual (248966-042b september 2019):

Assembly/Compiler Coding Rule 11. (M impact, H generality) When executing code from the Decoded Icache, direct branches that are mostly taken should have all their instruction bytes in a 64B cache line and nearer the end of that cache line. Their targets should be at or near the beginning of a 64B cache line.

Assembly/Compiler Coding Rule 47. (H impact, M generality) Try to arrange data structures such that they permit sequential access. (*)

Assembly/Compiler Coding Rule 51. (H impact, L generality) Always put code and data on separate pages. Avoid self-modifying code wherever possible. If code is to be modified, try to do it all at once and make sure the code that performs the modifications and the code being modified are on separate 4-KByte pages or on separate aligned 1-KByte subpages.

(*) An earlier version of the same manual had:

Assembly/Compiler Coding Rule 45. (H impact, H generality) Align data on natural operand size address boundaries. ... A 64-byte or greater data structure or array should be aligned so that its base address is a multiple of 64.


The easy way to follow these rules would be to use the ALIGN 64 and ALIGN 4096 (or ALIGN 1024) commands in the source text. However since MS-DOS only loads an application on a 16-byte aligned memory address, there's little chance that the above ALIGN's will actually do the appropriate alignment as required for the optimal performance.

I wrote next program to make sure that any application will be loaded by DOS at the start of a 4-KByte memory page. Thereafter all possible ALIGN's in the application will perform as expected, which is aligning to physical memory boundaries.

; ALIGN.COM (c) 2020 Sep Roland
; This tool creates a memory block in order to achieve that any subsequent
; program will be loaded by DOS at the desired physical memory alignment.

    ORG 256

ALIGNMENT equ 4096      ; 16, 32, 64, ...

    mov dx, (ALIGNMENT/16)-1
    mov ax, cs
    and ax, dx
    jz  NOP             ; Alignment OK
    sub dx, ax          ; Filler paragraphs (MCB excluded)
    jnz .b
.a: add dx, (ALIGNMENT/16)
.b: cmp dx, 6           ; (a)
    jb  .a
TSR:
    mov es, [002Ch]     ; (b)
    mov ah, 49h         ; DOS.ReleaseMemory
    int 21h
    mov ax, 3100h       ; DOS.TerminateAndStayResident
    int 21h
NOP:
    ret

    END

(a) DOS (version 6.20) refuses a TSR with fewer than 6 paragraphs. Important PSP data.
(b) Releasing the environment is necessary to avoid that allocating a new environment for the next task messes up the established alignment. And of course the left behind paragraphs are just (unused) data - not code that could be needing an environment string.

Are there any IDE's, shells, or DOS clones in existence that do this kind of base alignment before loading the program to run?

When the programstart is merely 16-byte aligned, what are alternative ways to make code- and data alignment right vis-à-vis the CPU?

Sep Roland
  • 33,889
  • 7
  • 43
  • 76
  • 1
    I suppose you could use `LOADFIX` to specify a load address for your DOS program. But apart from that? Interesting question! – fuz May 03 '20 at 15:57
  • 3
    Other than initialized static data, could you maybe leave space at the start of the .data section so you can adjust `ds` until the DS segment base is 64-byte aligned? Maybe memmove initialized static data, and just make sure the BSS is 48 or 64 bytes larger than necessary to make sure there's room to use it offset. – Peter Cordes May 03 '20 at 16:00
  • 1
    Note that modern CPUs are actually much better at avoiding SMC stalls than the optimization manual's guidelines; e.g. [Skylake apparently](https://stackoverflow.com/a/61481672/224132) only has a problem when code and data are in the same 64-byte cache line; adjacent lines = no penalty. That's impossible if they're more than 64 bytes apart. (Relative alignment is unnecessary beyond a sufficient distance, so 1k would do it.) In real mode there's no iTLB vs. dTLB benefit to keeping read-only data in different pages from code. – Peter Cordes May 03 '20 at 16:05
  • 1
    When DOS was rampant, most of those hints didn't apply. So I don't think you'll find tools to align code and data. Also, if I had to do it, I'd simply pad the data segment. I know it uses space on disk but it way easier to read. With MZ exes, you can probably use a suitable BSS section – Margaret Bloom May 03 '20 at 17:23
  • I'm not sure why this really matters, compilers don't try to give things more 16 byte alignment except when its an requirement. – Ross Ridge May 04 '20 at 01:58
  • I wonder if page alignment is even relevant in real mode where there is no paging. – fuz Jul 26 '20 at 14:01
  • 1
    @fuz It's true that I've never had much luck with these code and data alignments that the optimization manual proposes. **The only form of alignment that seems to work well in real address mode is dword stack alignment**. That one has a huge impact! This misfortune with alignment in general is what made me write the ALIGN.COM program in the first place. – Sep Roland Jul 26 '20 at 14:16

0 Answers0