3

I'm learning how to use masm using MS's official docs, but there's almost nothing written on the option directive (https://learn.microsoft.com/en-us/cpp/assembler/masm/option-masm?view=msvc-170). It was used in the standard library implementation of memcpy where it seems to work properly

my code:


title entry - general purpose testing ground for asm operations
include ksamd64.inc
        subttl  "entry"

    NESTED_ENTRY entry, _TEXT
    option PROLOGUE:NONE, EPILOGUE:NONE
;       error here: "A2220 Missing .ENDPROLOGUE"
        cvtpi2ps xmm0, qword ptr[rcx]
        rsqrtps xmm1, xmm0
        movaps xmmword ptr[rcx], xmm1
    
        cvtpi2ps xmm0, qword ptr[rcx+8]
        rsqrtps xmm1, xmm0
        movups xmmword ptr[rcx+8], xmm1
    .beginepilog
        ret
    NESTED_END entry, _TEXT
end
Badasahog
  • 579
  • 2
  • 19
  • Separate from MASM shenanigans and inconveniences, it's probably faster to convert and rsqrt 4 floats at once instead of 2, with `cvtdq2ps xmm0, [rcx]` (aligned 16-byte load allows folding a memory source operand instead of needing a separate `movups`) / `rsqrtps xmm0,xmm0` / `movaps [rcx], xmm0` (aligned 16-byte store is apparently ok). If you actually need to store another 8 bytes of garbage starting at `[rcx+16]` like you're doing with the 16-byte `movups` store (from rsqrtps on whatever bit-patterns were in the high half of xmm0 on function entry), you can do that if you want. – Peter Cordes May 06 '22 at 13:52
  • `cvtdq2ps` writes the full destination with no false dependency, vs. `cvtpi2ps xmm, mm/m64` merging a new low qword into an XMM with a false dependency. So even calling this in a loop, there's no downside to rsqrt in the same register. (Of course, you should inline this into a loop instead of calling for every vector! If you're using C or C++, intrinsics like `_mm_load_si128` (or `loadu` if maybe not aligned) / `_mm_cvtepi32_ps` will save you the trouble of hand-writing asm. – Peter Cordes May 06 '22 at 13:56
  • @PeterCordes that was just example code I put in, has nothing to do with my question. – Badasahog May 06 '22 at 19:19
  • 1
    I know, that's why I put them as comments instead of as an answer. But clearly you got that code from somewhere, so I thought it would be helpful to point out that it's inefficient and possibly buggy. If not you, then for future readers who see those instructions. If you wanted a more minimal [mcve] without distractions, you could have trimmed it to just a `ret` as the function body, if that still reproduces the error message. – Peter Cordes May 06 '22 at 19:21
  • @PeterCordes its not a typo. that's how they want you to spell it for some reason. That's how its spelled in the msvc standard library implementation – Badasahog May 10 '22 at 21:03
  • Oh nvm, `.beginepilog` is "Begin Epilog / Epilogue". That's totally normal, I'm just half asleep and was looking for any "prolog" directives, even if they were mis-placed not near the start of the function. – Peter Cordes May 10 '22 at 21:37

1 Answers1

2

Indeed poorly documented, but is your code not mixing versions/environments?. I have dug up following info from different corners of the web.

About .ENDPROLOG

Signals the end of the prologue declarations. It is an error to use any of the prologue declarations outside of the region between PROC FRAME and .ENDPROLOG.


About NESTED_ENTRY

A NESTED_ENTRY must have an associated PROLOG_END (SH-4) and ENTRY_END (SH-4).

About PROLOG_END

This macro must appear following a NESTED_ENTRY (SH-4) or LEAF_ENTRY (SH-4) macro.
It appears after the prolog area and before the matching ENTRY_END (SH-4) macro.

About ENTRY_END

This macro ends the current routine specified by NESTED_ENTRY (SH-4) or LEAF_ENTRY (SH-4).
Syntax ENTRY_END[Name]
Name should be the same name used in the NESTED_ENTRY or LEAF_ENTRY macros.
The ENTRY_END (SH-4) macro currently ignores Name.


You didn't use PROLOG_END and you wrote NESTED_END instead of ENTRY_END.

Why do you use the phrase option PROLOGUE:NONE, EPILOGUE:NONE? Perhaps simply remove it...

Sep Roland
  • 33,889
  • 7
  • 43
  • 76
  • He mentioned that the options were used in Microsoft `memcpy` code. I did find something that may be related to that: https://gist.github.com/Const-me/3290266d2a5f51409eb813d39b28007c – Michael Petch May 10 '22 at 23:45
  • https://learn.microsoft.com/en-us/cpp/assembler/masm/option-masm?view=msvc-170 its listed as an option for the `OPTION` keyword. But it doesn't seem to do anything – Badasahog May 12 '22 at 21:07