1

I'm trying to get better understanding of how assembly and machine code works. So I'm compiling this simple snipet with gcc :

#include <stdio.h>
int main(){
    printf("Hello World!");
    return 0;
}

But this includes the default library. I would like to output hello world without using printf but by inlining some assembly in the C file, and adding -nostdlib and -nodefaultlibs options to gcc. How can I do that ? I'm using Windows 10 and mingw-w64 with Intel core i7 6700 HQ (laptop processor). Can I use NASM with gcc on windows ?

Michael Petch
  • 46,082
  • 8
  • 107
  • 198
Mister Fresh
  • 670
  • 1
  • 10
  • 22
  • 1
    It would be easier on Linux... There is a simple syscall interface – Antti Haapala -- Слава Україні Aug 05 '19 at 20:52
  • 2
    You cannot directly manipulate the hardware unless your code runs in the kernel. Userland programs must be content to make system calls instead. – John Bollinger Aug 05 '19 at 20:54
  • 1
    I'll echo @JohnBollinger here in regards to the OS getting between you and the hardware. If you really want to get an understanding of how a machine works at such a low level, you might consider getting a microcontroller where the code you compile runs directly on the hardware without any OS getting between you and the metal. If you want to print "hello world", you can get yourself a little display peripheral, read the datasheet to figure out how to wire it up and communicate to it. – Christian Gibbons Aug 05 '19 at 20:57
  • @ChristianGibbons could I do that on a raspberry pi 4 ? – Mister Fresh Aug 05 '19 at 20:58
  • Fair enough, @ChristianGibbons, but the question specifies a hosted (on Windows 10) environment. – John Bollinger Aug 05 '19 at 21:00
  • @JohnBollinger windows would be easier because of drivers. I installed ubuntu earlier but for some reason the laptop fans would not work and it would overheat especially in summer. So the OS prevents any direct access of any code I write to hardware ? except memory allocation maybe? – Mister Fresh Aug 05 '19 at 21:08
  • 2
    Is linking `Kernel32.lib` allowed? Technically that's not the standard library. – Piotr Praszmo Aug 05 '19 at 21:09
  • @JohnBollinger I interpreted the platform in the question to simply be what they currently have to work with, rather than a requirement. It is possible I may have misunderstood exactly what it is the OP is trying to learn. My interpretation was a desire to better understand the relation between programming and the underlying machine, which I figured a simple microcontroller could help facilitate. – Christian Gibbons Aug 05 '19 at 21:09
  • In Posix more generally you could `write(0,...)`. – Lee Daniel Crocker Aug 05 '19 at 21:44
  • Does not including the standard library mean that the Windows library code that calls main() would not be included either? You'd have to code a replacement, possibly requiring some assembly, in order to produce a Windows runnable program with no library modules. – rcgldr Aug 06 '19 at 02:56
  • 1
    @MisterFresh: Raspberry Pi runs Linux, which has a stable/documented ABI for system calls, unlike Windows where you're only "supposed" to use DLL calls. But apparently Windows loads some DLLs by default into your process even if you don't specifically link against them. But anyway, yes it's easier to make system calls directly on Linux than on Windows, and the file-descriptor number for `stdout` is guaranteed to be `1` so you don't need a `GetStdHandle` or anything. But RPi is ARM, not x86, so it's a different assembly language! – Peter Cordes Aug 06 '19 at 03:34
  • @LeeDanielCrocker: You mean `write(1, ...)`. FD 0 = stdin, FD 1 = stdout. But the normal state (without any redirection) is fd 0,1,2 all refer to the same file descript*ion* which is opened read/write. So yes it does work by accident to `write(0, ...)` unless you pipe/redirect stdout. – Peter Cordes Aug 06 '19 at 03:35
  • 1
    @MisterFresh: Or if you want to write your own bootloader that runs on the bare metal, I think you can boot that on a RPi. But I'm not sure what it has in terms of graphics hardware that you'd need to program manually. I'd highly suggest getting comfortable with asm in user-space programs under an OS (doing the same things as compiler-generated code would, and/or looking at compiler-generated as for comparison) before you start booting your own code on bare metal and having to write drivers and stuff. – Peter Cordes Aug 06 '19 at 03:44

2 Answers2

3

I recommend against using GCC's inline assembly. It is hard to get right. You ask the question Can I use NASM with GCC on windows?. The answer is YES, please do! You can link your 64-bit NASM code to a Win64 object and then link it with your C program.

You have to have knowledge of the Win64 API. Unlike Linux you aren't suppose to make system calls directly. You call the Windows API which is a thin wrapper around the system call interface.

For the purposes of writing to the console using the Console API you need to use a function like GetStdHandle to get a handle to STDOUT and then call a function like WriteConsoleA to write an ANSI string to the console.

When writing assembly code you have to have knowledge of the calling convention. Win64 calling convention is documented by Microsoft. It is also described in this Wiki article. A summary from the Microsoft documentation:

Calling convention defaults

The x64 Application Binary Interface (ABI) uses a four-register fast-call calling convention by default. Space is allocated on the call stack as a shadow store for callees to save those registers. There's a strict one-to-one correspondence between the arguments to a function call and the registers used for those arguments. Any argument that doesn’t fit in 8 bytes, or isn't 1, 2, 4, or 8 bytes, must be passed by reference. A single argument is never spread across multiple registers. The x87 register stack is unused, and may be used by the callee, but must be considered volatile across function calls. All floating point operations are done using the 16 XMM registers. Integer arguments are passed in registers RCX, RDX, R8, and R9. Floating point arguments are passed in XMM0L, XMM1L, XMM2L, and XMM3L. 16-byte arguments are passed by reference. Parameter passing is described in detail in Parameter Passing. In addition to these registers, RAX, R10, R11, XMM4, and XMM5 are considered volatile. All other registers are non-volatile.

My note: the shadow store is 32 bytes that have to be allocated on the stack after any stack arguments before a C or Win64 API function call is made.

This is a NASM program that calls a function WriteString function that takes a string to print as the first parameter and the length of the string for the second. WinMain is the default entry point for Windows console programs:

global WinMain                  ; Make the default console entry point globally visible
global WriteString              ; Make function WriteString globally visible          

default rel                     ; Default to RIP relative addressing rather
                                ;     than absolute

; External Win API functions available in kernel32
extern WriteConsoleA
extern GetStdHandle
extern ExitProcess

SHADOW_AREA_SIZE  EQU 32
STD_OUTPUT_HANDLE EQU -11

; Read Only Data section
section .rdata use64
strBrownFox db "The quick brown fox jumps over the lazy dog!"
strBrownFox_len equ $-strBrownFox

; Data section (read/write)
section .data use64

; BSS section (read/write) zero-initialized
section .bss use64
numCharsWritten: resd 1      ; reserve space for one 4-byte dword

; Code section
section .text use64

; Default Windows entry point in 64-bit code
WinMain:
    push rsp                 ; Align stack on 16-byte boundary. 8 bytes were
                             ;     pushed by the CALL that reached us. 8+8=16

    lea rcx, [strBrownFox]   ; Parameter 1 = address of string to print
    mov edx, strBrownFox_len ; Parameter 2 = length of string to print
    call WriteString

    xor ecx, ecx             ; Exit and return 0
    call ExitProcess

WriteString:
    push rbp
    mov rbp, rsp             ; Creating a stack frame is optional
    push rdi                 ; Non volatile register we clobber that has to be saved
    push rsi                 ; Non volatile register we clobber that has to be saved
    sub rsp, 16+SHADOW_AREA_SIZE
                             ; The number of bytes pushed must be a multiple of 8
                             ;     to maintain alignment. That includes RBP, the registers
                             ;     we save and restore, the maximum number of extra
                             ;     parameters needed by all the WinAPI calls we make
                             ;     And the Shadow Area Size. 8+8+8+16+32=72.
                             ;     72 is multiple of 8 so at this point our stack
                             ;     is aligned on a 16 byte boundary. 8 bytes were pushed
                             ;     by the call to reach WriteString.
                             ;     72+8=80 = 80 is evenly divisible by 16 so stack remains
                             ;     properly aligned after the SUB instruction

    mov rdi, rcx             ; Store string address to RDI (Parameter 1 = RCX)
    mov esi, edx             ; Store string length to RSI (Parameter 2 = RDX)

    ; HANDLE WINAPI GetStdHandle(
    ;  _In_ DWORD nStdHandle
    ; );
    mov ecx, STD_OUTPUT_HANDLE
    call GetStdHandle

    ; BOOL WINAPI WriteConsole(
    ;  _In_             HANDLE  hConsoleOutput,
    ;  _In_       const VOID    *lpBuffer,
    ;  _In_             DWORD   nNumberOfCharsToWrite,
    ;  _Out_            LPDWORD lpNumberOfCharsWritten,
    ;  _Reserved_       LPVOID  lpReserved
    ; );

    mov ecx, eax             ; RCX = File Handle for STDOUT.
                             ; GetStdHandle returned handle in EAX

    mov rdx, rdi             ; RDX = address of string to display
    mov r8d, esi             ; R8D = length of string to display       
    lea r9, [numCharsWritten]
    mov qword [rsp+SHADOW_AREA_SIZE+0], 0
                             ; 5th parameter passed on the stack above
                             ;     the 32 byte shadow space. Reserved needs to be 0 
    call WriteConsoleA

    pop rsi                  ; Restore the non volatile registers we clobbered 
    pop rdi
    mov rsp, rbp
    pop rbp
    ret

You can assemble, and link with these commands:

nasm -f win64 myprog.asm -o myprog.obj
gcc -nostartfiles -nostdlib -nodefaultlibs myprog.obj -lkernel32 -lgcc -o myprog.exe

When you run myprog.exe it should display:

The quick brown fox jumps over the lazy dog!

You can also compile C files into object files and link them to this code and call them from assembly as well. In this example GCC is simply being used as a linker.


Compiling C Files and Linking with Assembly Code

This example is similar to the first one except we create a C file called cfuncs.c that calls our assembly language WriteString function to print Hello, world!:

cfuncs.c

/* WriteString is the assembly language function to write to console*/
extern void WriteString (const char *str, int len);

/* Implement strlen */
size_t strlen(const char *str)
{
    const char *s = str;
    for (; *s; ++s)
        ;

    return (s-str);
}

void PrintHelloWorld(void)
{
    char *strHelloWorld = "Hello, world!\n";
    WriteString (strHelloWorld, strlen(strHelloWorld));
    return;
}

myprog.asm

default rel                     ; Default to RIP relative addressing rather
                                ;     than absolute

global WinMain                  ; Make the default console entry point globally visible
global WriteString              ; Make function WriteString globally visible          

; Our own external C functions from our .c file
extern PrintHelloWorld

; External Win API functions in kernel32
extern WriteConsoleA
extern GetStdHandle
extern ExitProcess

SHADOW_AREA_SIZE  EQU 32    
STD_OUTPUT_HANDLE EQU -11

; Read Only Data section
section .rdata use64
strBrownFox db "The quick brown fox jumps over the lazy dog!", 13, 10
strBrownFox_len equ $-strBrownFox

; Data section (read/write)
section .data use64

; BSS section (read/write) zero-initialized
section .bss use64
numCharsWritten: resd 1      ; reserve space for one 4-byte dword

; Code section
section .text use64

; Default Windows entry point in 64-bit code
WinMain:
    push rsp                 ; Align stack on 16-byte boundary. 8 bytes were
                             ;     pushed by the CALL that reached us. 8+8=16

    lea rcx, [strBrownFox]   ; Parameter 1 = address of string to print
    mov edx, strBrownFox_len ; Parameter 2 = length of string to print
    call WriteString

    call PrintHelloWorld     ; Call C function that prints Hello, world!

    xor ecx, ecx             ; Exit and return 0
    call ExitProcess

WriteString:
    push rbp
    mov rbp, rsp             ; Creating a stack frame is optional
    push rdi                 ; Non volatile register we clobber that has to be saved
    push rsi                 ; Non volatile register we clobber that has to be saved
    sub rsp, 16+SHADOW_AREA_SIZE
                             ; The number of bytes pushed must be a multiple of 8
                             ;     to maintain alignment. That includes RBP, the registers
                             ;     we save and restore, the maximum number of extra
                             ;     parameters needed by all the WinAPI calls we make
                             ;     And the Shadow Area Size. 8+8+8+16+32=72.
                             ;     72 is multiple of 8 so at this point our stack
                             ;     is aligned on a 16 byte boundary. 8 bytes were pushed
                             ;     by the call to reach WriteString.
                             ;     72+8=80 = 80 is evenly divisible by 16 so stack remains
                             ;     properly aligned after the SUB instruction

    mov rdi, rcx             ; Store string address to RDI (Parameter 1 = RCX)
    mov esi, edx             ; Store string length to RSI (Parameter 2 = RDX)

    ; HANDLE WINAPI GetStdHandle(
    ;  _In_ DWORD nStdHandle
    ; );
    mov ecx, STD_OUTPUT_HANDLE
    call GetStdHandle

    ; BOOL WINAPI WriteConsole(
    ;  _In_             HANDLE  hConsoleOutput,
    ;  _In_       const VOID    *lpBuffer,
    ;  _In_             DWORD   nNumberOfCharsToWrite,
    ;  _Out_            LPDWORD lpNumberOfCharsWritten,
    ;  _Reserved_       LPVOID  lpReserved
    ; );

    mov ecx, eax             ; RCX = File Handle for STDOUT.
                             ; GetStdHandle returned handle in EAX

    mov rdx, rdi             ; RDX = address of string to display
    mov r8d, esi             ; R8D = length of string to display       
    lea r9, [numCharsWritten]
    mov qword [rsp+SHADOW_AREA_SIZE+0], 0
                             ; 5th parameter passed on the stack above
                             ;     the 32 byte shadow space. Reserved needs to be 0 
    call WriteConsoleA

    pop rsi                  ; Restore the non volatile registers we clobbered 
    pop rdi
    mov rsp, rbp
    pop rbp
    ret

To assemble, compile, and link to an executable you can use these commands:

nasm -f win64 myprog.asm -o myprog.obj
gcc -c cfuncs.c -o cfuncs.obj
gcc -nodefaultlibs -nostdlib -nostartfiles myprog.obj cfuncs.obj -lkernel32 -lgcc -o myprog.exe 

The output of myprog.exe should be:

The quick brown fox jumps over the lazy dog!
Hello, world!
Michael Petch
  • 46,082
  • 8
  • 107
  • 198
  • @PeterCordes : I actually was half asleep tonight and quickly typed the comments up and they were incorrect. I had noticed that error as well. Not only did I get that wrong, I also misread the question and am revising it as we speak. You'll see what I mean when I post the answer. I actually intended to delete this until I fixed it. – Michael Petch Aug 06 '19 at 03:31
  • `mov r9, numCharsWritten` - I would have used a stack slot, or at least a RIP-relative LEA (with `default rel`) instead of `mov r64, imm64` to put an address into a register. Same for `mov rcx, strBrownFox`, `lea rcx, [rel strBrownFox]` is the standard x86-64 way to put a static address into a register. I don't like `bits 64` either; better to get an assemble-time error than to allow accidentally assembling 64-bit machine code into a 32-bit object file. – Peter Cordes Aug 06 '19 at 04:34
  • @PeterCordes : the reason I didn't use a stack slot was primarily because I wanted to put something in one of the data sections to show where data went. Sometimes examples are intended to show varying features. I will accept the entry in BSS though as it demonstrates that well. – Michael Petch Aug 06 '19 at 04:35
  • @MisterFresh : There is a definite learning curve. Understanding the calling convention, getting stack alignment right (usually causes the most hassles), finding the functions in the WinAPI to do what you want can be a bit involved if you are new to assembly on Windows. – Michael Petch Aug 06 '19 at 06:17
  • Indeed, satisfying the calling convention, GPR preservation, stack alignment manually with each WinABI invokation is very tedious in 64bit mode. That's why I wrote macroinstructions which hide the boring chores and make the learning curve steeper and the build easier: https://euroassembler.eu/eadoc/#HelloWorld – vitsoft Aug 06 '19 at 21:03
2

You can do so on linux in NASM 32bit by moving a string into memory writing to the STDOUT file and invoking SYS_WRITE.

On windows it is a more convoluted to do so and less of a useful learning experience so I would recommend that you setup WSL or a linux vm and follow these steps.

See the following links for tutorials on how to do so:
32Bit(Not Supported in WSL):
https://asmtutor.com/#lesson1
64Bit:
http://briansteffens.com/introduction-to-64-bit-assembly/01-hello-world/

Link for setting up WSL:
https://learn.microsoft.com/en-us/windows/wsl/install-win10

Bryan
  • 441
  • 2
  • 8
  • I have WSL but isnt that running in a VM ? It seems it would be adding more layers between the code and the metal. Maybe it would be easier on a raspberry pi ? – Mister Fresh Aug 05 '19 at 21:25
  • 2
    You can't use `INT 80h` on x86-64. That's the 32-bit ABI. See https://stackoverflow.com/questions/46087730/what-happens-if-you-use-the-32-bit-int-0x80-linux-abi-in-64-bit-code – Andrew Henle Aug 05 '19 at 21:25
  • @MisterFresh: Windows doesn't support direct use of any system calls. Call numbers are subject to change with different Windows versions, and aren't documented. So you *can* use them via `syscall`, but only as a toy experiment, not safely for code you want to distribute. WSL isn't a VM, it's just an emulation layer inside the kernel that translates system calls. (Although I think I read that newer WSL will basically run a real Linux kernel in a VM) – Peter Cordes Aug 05 '19 at 23:20
  • I wouldn't say it isn't useful. It isn't too difficult to have `WinMain` in assembly code and bypass the C startup. Writing native windows code isn't quite the same as having the OP do it in WSL. Calling the Win API is a bit involved given the alignment requirements and the shadow space compared to a Linux Syscall. – Michael Petch Aug 06 '19 at 05:26