0

As a novice I am following tutorials. One is to capitalize all chars in a string, in inline assembly in VS 2022:

int main()
{
    char mystr[] = "Hello World:";

    _asm
    {
        mov ecx, length mystr
    my: cmp [mystr + ecx], 'a';
        jl nocap;
        cmp [mystr + ecx], 'z';
        ja nocap;
        sub [mystr + ecx], 32;

    nocap:
    loop my
    }

    std::cout << mystr;

My question is: Why does this assembly not need sections, such as .data, .text or _start: There might be some mix of x86 asm and Linux asm in the examples.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
mortenlund
  • 49
  • 6

1 Answers1

6

Because it's inline asm! It's just inside the body of your C++ function, and the compiler chooses which section to put the machine code for functions into. (.text).

You don't need section directives for the same reason C++ programs don't need manual section directives or GNU C __attribute__((section(".text"))) - C++ compilers have working defaults for where to put stuff.


In fact MSVC doesn't even let you switch sections or use db to emit arbitrary bytes inside an asm{} block; it's not a full assembler because it has to parse and understand your asm to know what registers it might modify, so it knows what to save.

GNU C inline asm leave is up to you to tell the compiler what your inline asm's outputs/inputs/clobbers are, and lets you emit arbitrary text which the assembler will assemble.

GCC literally works by generating a .s text file and running as to assemble it. GNU C inline asm ("add %1, %0" : "=r"(dst) : "r"(src)) works kind of like a compile-time printf formatting custom text into that assembly file. You can do stuff that would break the following compiler-generated code, like switching sections without using .pushsection / .popsection to come back to the section the compiler was in. Or switching asm syntax to something other than what the compiler was using.

Or doing useful things like using .pushsection .data and emitting some bytes in .data, then coming back. The Linux kernel makes some use of this. e.g. in arch/x86/asm/alternative.h they .pushsection .smp_locks to record the address of the lock prefix for atomic RMW instructions (using .long to emit 4 bytes), so if the kernel boots on a machine with only one CPU, it can patch those lock prefixes to nop or a dummy prefix, since the relevant instructions are all atomic wrt. interrupts, just not other CPUs that could be running at the same time.


You also don't need to write your own _start (the standard name for the process entry point in Linux) because the C++ compiler links your program with CRT startup code that provides one that calls main and exits with main's return value if main returns. If you wanted to write your own, you could in GNU C inline asm for Linux and use gcc -nostdlib (which implies -nostartfiles): How Get arguments value using inline assembly in C without Glibc?

For MSVC you'd need to use a separate .asm file; MSVC doesn't allow inline asm at global scope. Or maybe you could use a _declspec(naked) function to define a WinMain or whatever the actual entry point is for an executable, with your hand-written asm instructions that do the necessary setup like calling standard library init functions, and then call main / push eax / call ExitProcess (or actually call exit to make sure cleanup happens, like flushing stdio buffers).

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • Thanks for a very detailed answer. I look forward to be able to distinguish all of this - which might take a while or two. Best wishes. – mortenlund Aug 21 '22 at 18:16
  • 1
    @mortenlund: Perhaps looking at the pure asm generated by the compiler would be informative; you can see what directives it uses when compiling your C++ (including the inline asm). On https://godbolt.org/z/8bTWv9Pfv , right click on a C++ source line and "reveal linked code" to scroll past the noise of directives and stuff in MSVC's output from the header. Or make a version of it that doesn't include anything or use `cout`, just the asm part and return, to simplify the asm output. – Peter Cordes Aug 21 '22 at 21:00
  • 1
    In that link I unchecked the "directives" filter so it would leave in lines like `_TEXT SEGMENT` (that's MASM syntax, not GAS `.section .text` or NASM `section .text`). See also [How to remove "noise" from GCC/clang assembly output?](https://stackoverflow.com/q/38552116) – Peter Cordes Aug 21 '22 at 21:00
  • It sounds very reasonable. I will look into that. Thank you for sharing this with me. Your path is showing the exact same example as the tutorial. : ) – mortenlund Aug 26 '22 at 13:20