0

I am studying assembler for the x86 family of processor architectures (32-bit and 64-bit) on Windows. It is not to say that I'm quite a beginner, but I probably don't know everything, at least about the syntax of the MASM assembler, as it seems.

I use the MASM assembler (for 64-bit programs) located in folders belonging to Visual Studio: "..\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.29.30133\bin\Hostx64\x64\ml64.exe" Visual Studio 2019 is installed, and I use the MASM assembler from its folder. I have Windows 7 myself.

I made my program for a 32-bit system, and it was normally assembled by MASM for 32-bit programs and worked. Then I translated its code for a 64-bit architecture (and there are a few changes needed in the code there). But, when assembling it with MASM for 64-bit programs, MASM gave an error message that there was allegedly some unresolved "StartOfProgram" symbol. Here's what's in the console:

C:\Assembler>cd "C:\Assembler"

C:\Assembler>"C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.29.30133\bin\Hostx64\x64\ml64.exe" "C:\Assembler\Main.asm" /link /subsystem:windows /entry:StartOfProgram
Microsoft (R) Macro Assembler (x64) Version 14.29.30138.0
Copyright (C) Microsoft Corporation.  All rights reserved.

 Assembling: C:\Assembler\Main.asm
Microsoft (R) Incremental Linker Version 14.29.30138.0
Copyright (C) Microsoft Corporation.  All rights reserved.

/OUT:Main.exe
Main.obj
/subsystem:windows
/entry:StartOfProgram
LINK : error LNK2001: unresolved external symbol StartOfProgram.
Main.exe : fatal error LNK1120: unresolved external symbols: 1

I spent about two weeks or month searching for solution to this error, but I didn't find it.

In general, it used to give an error message that allegedly there is some unresolved symbol "WinMainCRTStartup", but recently I kind of realized that it made such an entry point, because I did not explicitly specify entry point in the console (via the command "/entry:", which is in the console from above), but the problem about "unresolved external symbol" remained, even though I set the entry point where I needed it (that is, on "StartOfProgram").


Here is the code of my 64-bit version of the program that just has to output "Hello world" in a pop-up window:

option  casemap:none    ; As far as i understand, functions from Windows API without case sensitivity not works

; **** Importing what needs ****

includelib  "C:\Program Files (x86)\Windows Kits\10\Lib\10.0.19041.0\um\x64\kernel32.lib"   ; Downloading main static library to use main functions of Windows API

extern      LoadLibraryA:near    ; I load from static libraries functions used in this program
extern      GetProcAddress:near
extern      FreeLibrary:near
extern      ExitProcess:near

; **** Declaring memory segment ****

.data

        text                    db  'Hello world', 0            ; Text in "Text Box"'s window
        header                  db  'Title of hello world', 0   ; Header of "Text Box"'s window
        
        nameOfDLL               db  'user32.dll', 0
        nameOfProcedureOfDLL    db  'MessageBoxA', 0

        handlerToModule         dd  0
        addressOfProcedureOfDLL dq  0   ; In 64-bit operating system, addresses are 64-bit, so size of memory area that this label points to - is quad word (dq) (that is 64 bits)

.code

; **** Entry point to program ****

StartOfProgram:    ; For some reason, MASM assembler recommends putting "_" sign before label of entry point to program, if it is 32-bit. Therefore in 64-bit I don't.

        mov     rcx, offset nameOfDLL
        sub     rsp, 40                         ; Pointer shifting for alignment of stack and plus "shadow space" in stack. It needed by x64 calling convention
        call    LoadLibraryA                    ; I dynamically connect DLL so that i can then take function from it
        add     rsp, 40
        
        mov     qword ptr handlerToModule, rax
        
        mov     rcx, rax                        ; Functions from Windows API use stdcall convention. stdcall is agreement to pass function parameters to stack backwards, so rax is last. Rax still contains Windows' DLL address (Microsoft call it "handler") (after recent call to Loadlibrary function), so it's better to use register, processor works faster with registers
        mov     rdx, offset nameOfProcedureOfDLL
        sub     rsp, 40
        call    GetProcAddress
        add     rsp, 40
        
        mov     addressOfProcedureOfDLL, rax    ; I save address of procedure that i took from GetProcAddress. In 64-bit operating system, addresses are 64-bit, so needs to transfer rax register and not eax
        
        mov     rcx, 0
        mov     rdx, offset text
        mov     r8, offset header
        mov     r9, 0
        sub     rsp, 40
        call    addressOfProcedureOfDLL         ; It is better to immediately pass address of function through memory address label and not through register containing this address, because computer will still have to go to this address later and there is no point in wasting time reading from  register of same address
        add     rsp, 40        

        mov     rcx, offset handlerToModule
        sub     rsp, 40
        call    FreeLibrary
        add     rsp, 40

        mov     rcx, 0
        sub     rsp, 40
        call    ExitProcess
        add     rsp, 40

end

Here is the code of my 32-bit version of this program (which was normally assembled and worked):

.386    ; There indicates processor with minimal set of functions (since new Intel processors (in "x86" family of architectures) are compatible (so far) with instructions of old Intel processors of same family of architectures)

option  casemap:none    ; As far as i understand, functions from Windows API without case sensitivity not works

; **** Importing what needs ****

includelib  "C:\Program Files (x86)\Windows Kits\10\Lib\10.0.19041.0\um\x86\kernel32.lib"   ; Downloading main static library to use main functions of Windows API
;includelib  "C:\Program Files (x86)\Windows Kits\10\Lib\10.0.19041.0\um\x86\User32.lib"

extern      _LoadLibraryA@4:near    ; I load from static libraries a functions used in this program
extern      _GetProcAddress@8:near
extern      _FreeLibrary@4:near
extern      _ExitProcess@4:near

.model flat

; **** Declaring a memory segment ****

.data

        text                    db  'Hello world', 0            ; Text in "Text Box"'s window
        header                  db  'Title of hello world', 0   ; Header of "Text Box"'s windowокна

        nameOfDLL               db  'user32.dll', 0
        nameOfProcedureOfDLL    db  'MessageBoxA', 0

        handlerToModule         dd  0
        addressOfProcedureOfDLL dd  0

.code

; **** Entry point to program ****

_StartOfProgram:    ; For some reason, MASM assembler recommends putting "_" sign before label of entry point to program, if it is 32-bit

        push    offset nameOfDLL
        call    _LoadLibraryA@4                 ; I dynamically connect DLL so that i can then take function from it

        mov     handlerToModule, eax

        push    offset nameOfProcedureOfDLL
        push    eax                             ; Functions from Windows API use stdcall convention. stdcall is agreement to pass function parameters to stack backwards, so eax is last. Eax still contains Windows' DLL address (Microsoft call it "handler") (after recent call to Loadlibrary function), so it's better to use register, processor works faster with registers
        call    _GetProcAddress@8

        mov     addressOfProcedureOfDLL, eax    ; I save address of procedure that i took from GetProcAddress

        push    0
        push    offset header
        push    offset text
        push    0
        call    addressOfProcedureOfDLL

        push    handlerToModule
        call    _FreeLibrary@4

        push    0
        call    _ExitProcess@4

end _StartOfProgram

And here is result of 32-bit version of program: Result of 32-bit version of program

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Developer
  • 25
  • 5
  • 2
    Instead of just having StartOfProgram be a label, declare it as `StartOfProgram proc`. You'll need to add a matching `StartOfProgram endp` just before end. – David Wohlferd Jul 19 '22 at 09:09
  • @David Wohlferd, i want to use only my entry point to which my "StartOfProgram" label shows, as it was in 32-bit MASM, as well as here. At least because i suspect that they are to some extent high-level, and, as macro, they can make in my program the code that I did not enter there. Is there a way to do without proc and endp? – Developer Jul 19 '22 at 10:20
  • *For some reason* - Probably to be consistent with the Windows convention that C names are prepended with a leading underscore to get the asm symbol name in 32-bit code, but not in 64-bit code. For a symbol name that's never referenced from C, yeah either way should be fine. – Peter Cordes Jul 19 '22 at 14:51
  • `proc`/`endp` shouldn't introduce extra instructions if you don't use any MASM stuff that would make that happen, so at least give it a try and see if David's suggestion works. If that works but a simple label doesn't, that would still be an interesting question about why MASM is designed that way when it works in 32-bit. – Peter Cordes Jul 19 '22 at 14:52
  • BTW, your 64-bit code should be using the Windows x64 calling convention, at least for any actually DLL calls. You can make up your own custom calling convention for you own functions if you really want, but for DLLs you need to reserve shadow space and align RSP, and pass the first 4 args in RCX, RDX, R8, R9. (Look at compiler-generated code, e.g. from MSVC on https://godbolt.org/) – Peter Cordes Jul 19 '22 at 14:54
  • @Peter Cordes, i trying now to assemble code with "`proc`" and "`endp`", but every time i getting error from User32.dll, but not from kernel32.dll. I think that this actions with stack and registers can help, if it so necessarily for Windows "`.exe`" application, but interesting why i need to do this actions? How this "shadow space" of stack and this register definitions will be used? – Developer Jul 19 '22 at 15:10
  • So it assembles with that change? Then yeah, you can move on to debugging all the other problems. See [Shadow space example](https://stackoverflow.com/q/33273797) re: calling a function. Like I said, you can look at MSVC's asm output for examples, too, for C that calls the DLL functions you want to call. As for why you have to pass args in registers: because that's where callees will look for them. That's the standard x64 calling convention. https://docs.microsoft.com/en-us/cpp/build/x64-calling-convention?view=msvc-170 https://docs.microsoft.com/en-us/cpp/build/x64-software-conventions – Peter Cordes Jul 19 '22 at 15:19
  • Yes, it assembles after changing label to "`proc`" and "`endp`", but Windows gives me error, that have error code "`c0000005`" with offset of error "`00000000000714fe`", so, i think, i need to do these actions with stack and registers, but for first, i need to read about it in documentation which you sent – Developer Jul 19 '22 at 15:50
  • 1
    While I do not understand your objections to proc/endp (as Peter says, they don't add any instructions), if you prefer, you can also just add `public StartOfProgram`. Also, while LoadLibrary can be used to call functions this way, there are alternatives. For example, look at how you are calling LoadLibrary. It's just a function, right? And you called it by declaring the symbol as extern and linking in the appropriate library. The Process Loader took care of loading kernel32.dll for you and looking up the addresses. You should be able to do the same with MessageBoxA. – David Wohlferd Jul 19 '22 at 19:08
  • @David Wohlferd, It works. How i did not think of using label publication myself before, as it does in DLL, for example. Of course, if i write in command line that some defined label is entry point, then assembler should guess by himself, but developers of MASM didn't it. Finally, i can use more clear and natural assembler almost as it exist – Developer Jul 19 '22 at 20:11

1 Answers1

1

The problem was been solved in comments. As said by @Peter Cordes and @David Wohlferd, I needed to publish my label in my program by directive "public" and then writing the name of the label, or rewrite my entry-point-label with using directive "proc" and "endp" with name of label at beginning of this directives.


I prefer a solution through the "public" directive, because I think it is closer to low-level programming. In this case, I had to make my label public in my program using the "public" directive, and then write the name of the label at the end of it, to become available to external programs. The MASM assembler, apparently, gave an error due to fact that it did not see it accessible from the outside and therefore did not consider it correct to assign it as the entry point, although it could guess that if I specify it as entry point, then it is available for switching to it from the outside. Apparently, the developers of MASM didn't do this.

Here is an example of using directive "public" in my program (I used directive "public"): public StartOfProgram

And I noticed that I can put it anywhere in my code.

Here is an example of using directive "proc" and "endp" in my program:

StartOfProgram proc     ; - Beginning of this directivical procedure

; ... there may be the code itself inside this directivical procedure

StartOfProgram endp     ; - End of this directivical procedure

My code in the question had other errors, separate from the theme of this question; I've corrected it there.

Zoe
  • 27,060
  • 21
  • 118
  • 148
Developer
  • 25
  • 5
  • @Michael Petch, We talking about problem in 64-bit version of this program. Only 32-bit versions of programs can use entry-point-label at end of "`end`" directive. In 64-bit programs MASM prohibits this, and only allows to put "`end`" directive without label at end, and then it's only allowed to define entry point in command line of MASM with additional command to linker "`/entry:`" – Developer Jul 19 '22 at 22:52
  • Ah, that would be why you can use a non-exported (`public`) symbol in the 32-bit version, because the `end foo` creates metadata in the `.obj` file with the appropriate point in the code. Without that, you need a public label to create a symbol that can be seen externally for `/entry:foo` to find it. So it all makes sense. At least in hindsight, the question should have mentioned that 32-bit didn't use a `/entry:` arg at all, due to using that functionality of `end foo` – Peter Cordes Jul 19 '22 at 23:59
  • While `public` might 'feel' more low level, I doubt it actually changes the output. Still, if "low level" is what you are after, perhaps take a look at [this](https://stackoverflow.com/a/65434734/2189500). It doesn't even use a linker, with every output byte being specified by the assembler. While I don't recommend that you write production code this way, it was interesting and educational. It also shows how to call MessageBoxA under x64, so there's that. Note that by specifying an import table (something the linker usually does for you), it doesn't need Loadlibrary/GetProcAddress. – David Wohlferd Jul 20 '22 at 00:00
  • @DavidWohlferd: I'd say if you want something more "low level" without the risk of the assembler doing magic weird stuff behind your back, use NASM! It doesn't have `proc`/`endp` at all, just labels (and `global foo` instead of `public foo`). And it has a more consistent syntax where `[]` is always a memory reference, but anything without `[]` never is. Unlike MASM's nonsense ([Confusing brackets in MASM32](https://stackoverflow.com/q/25129743)). Also NASM is nicer with multi-character constants as integer literals just working in source byte order. – Peter Cordes Jul 20 '22 at 00:04
  • @PeterCordes: If you look at the link I provided, it uses NASM. – David Wohlferd Jul 20 '22 at 00:05
  • @DavidWohlferd: Ah right. But the main point there is rolling your own PE32+ executable metadata with `db` / `dd` / `dq` directives. That's far more low-level than you need to go, I meant just using NASM normally, with `nasm -f win64 foo.asm` to make `.obj` files. – Peter Cordes Jul 20 '22 at 00:09
  • @PeterCordes, I choosed using of MASM because it developes by Microsoft and, i think, they interested to update it as sooner as they can. Because it's Microsoft, they can use it in their programs, OS-s, and they created Visual Studio (and, i think, they using it) which uses assembler, in C++ as minimum. And other assemblers are did by other people, which can stop support and develop them in any time, except Microsoft, because Microsoft is big organisation which mains on programs. – Developer Jul 20 '22 at 08:30
  • @PeterCordes, So, it's reason why i use MASM but not other assemblers. I understand that other assemblers can be better in something, but danger of rewriting whole program just because of swapping from one assembler to other because of end of support - it's, i think, not worth it – Developer Jul 20 '22 at 08:30
  • @PeterCordes, And additional reason - if Microsoft does some programs like OS-s, or documentations like Windows documentation, then more probability that they will main their documentation and other things on MASM – Developer Jul 20 '22 at 09:03
  • NASM has multiple developers, and doesn't need much maintenance beyond adding support for new instruction sets (mostly new AVX-512 extensions). I'm not particularly worried about it getting abandoned; there are enough users that someone would step up. Or at *least* write a new assembler compatible with the same syntax. The object-file formats for Windows, Linux, and MacOS are pretty stable at this point. But if you'd rather use MASM which makes it hard to port any of your code to another OS, go ahead. At least JWASM exists to assemble it without proprietary software. – Peter Cordes Jul 20 '22 at 13:55
  • @PeterCordes, "_But if you'd rather use MASM which makes it hard to port any of your code to another OS, go ahead_" You mean that NASM can translate x86 assembler’s code to ARM, OS API-functions to analog in other OS or what? Because i think it can’t translate Windows API-functions to analog in MacOS, but have probability to translate x86 to ARM. Or you mean that MASM-syntax differs from NASM and it can br hard to translate from one syntax to another in different OS-s? – Developer Jul 24 '22 at 21:57
  • @Developer: A common use-case for hand-written asm is performance of some number-crunching code. e.g. x264 and x265 video encoders have various hand-written asm routines written in NASM syntax. With a few NASM macros to adapt for the calling-convention differences between Windows and non-Windows, the same code can work on x86-64 Linux, MacOS, Windows, etc. This code doesn't make any system-calls, of course; code that calls system libraries / system-calls is inherently non-portable. – Peter Cordes Jul 24 '22 at 22:06