3

Given: prog.c with an entry point prog

I normally do

cl.exe /MD /LD /Fe"prog.dll" /Fo"prog" "prog.c" /link ext.lib

or

cl.exe /MD /Fo"prog.obj"
cl.exe /MD /LD /Fe"prog.dll" "prog.obj" /link ext.lib

in both cases the resulting prog.dll works fine.

Now I did the following to get an asm file instead of the obj file:

cl.exe /c /MD /Fa"prog"

This "works" so far, too. But I cannot figure out how to make a dll of this file.

Tried:

ml.exe /c /Cx /coff prog.asm
cl.exe /MD /LD /Fe"prog.dll" "prog.obj" /link ext.lib

Result: prog.dll without entry point prog

Tried again:

ml.exe /c /Cx /coff prog.asm
cl.exe /MD /LD /Fe"prog.dll" "prog.obj" /link /entry:prog ext.lib

Result: compiler warning about wrong entry point _prog not being stdcall with 12 byte arguments and a compiler error about unresolved symbol _memcpy.

Question: Is there any way to compile the asm file which cl.exe generates by /Fa to a dll (preferably via cl.exe, if not possible with ml.exe)?

Simon Sobisch
  • 6,263
  • 1
  • 18
  • 38
  • 2
    You can't in general assemble the output of the Microsoft compiler and get something that works. MASM doesn't support all the features the compiler uses, so the assembly output is only representative of the object file the compiler creates. – Ross Ridge Dec 07 '16 at 00:22
  • @RossRidge "the assembly output is only representative of the object file the compiler creates" - this should be no problem as I can compile the object file the compiler creates when I do it directly - I just don't know *how* to compile/link it to get it working from asm file. – Simon Sobisch Dec 07 '16 at 00:41
  • No, the fact that you were able to assemble the assembly file that the compiler created into an object file doesn't necessarily mean you got something that works. What you're trying to do isn't supported by Microsoft's compiler. You should just use the object file that compiler creates directly. – Ross Ridge Dec 07 '16 at 00:51
  • The LLVM toolchain supports first creating a .asm and then assembling it. There is even a tool and test set now that ensures that this is equivalent to directly producing an output (Check CFC), see http://www.snsystems.com/tech-blog/2015/04/22/verifying-game-developer-assumptions/ – masterxilo Feb 20 '17 at 19:06
  • The Nvidia nvcc CUDA compiler can also compile to their assembly language ptx as an intermediate step and then go on compiling from that, instead of giving an object file directly. I think looking at this is educational. – masterxilo Feb 20 '17 at 19:09

2 Answers2

6

Is there any way to compile the asm file which cl.exe generates by /Fa to a dll (preferably via cl.exe, if not possible with ml.exe)?

No:

  1. The C/C++ compiler (cl.exe) cannot assemble assembly-code input. It takes only C or C++ source code as input. The assembler is MASM (ml.exe).
  2. The assembly-code output of cl.exe cannot, in general, be fed directly into MASM. In some cases, it is not even valid assembly code. In other cases, there are directives, keywords, and other things emitted in the code that MASM doesn't directly support. Things get especially hairy if the C/C++ source uses exceptions. The listing file is for informational purposes only.

It is very unclear to me why you are even wanting to do this in the first place. If your source code is either C or C++, and can be compiled and linked by MSVC, then what is the point of introducing the additional intermediate step of converting it to and from assembly language? Just use cl.exe directly to make a DLL.

If you absolutely must do this, you will have to take the ASM listing file generated by MSVC and manually clean it up before running it through MASM. You can make this clean-up task easier by turning off whole program optimization, turning off exception handling, turning off security checks/cookies, and indicating to the linker that the image does not contain safe SEH handlers. Note that some of these may break or change the behavior of your code! You'll also need to add EXTERN definitions for functions that you call from runtime libraries.

Simon Sobisch
  • 6,263
  • 1
  • 18
  • 38
Cody Gray - on strike
  • 239,200
  • 50
  • 490
  • 574
  • I've corrected the typo in the question and removed the explanation in your answer. Here comes the reason for doing this (and for the typo): I'm building some tests for GnuCOBOL which uses an underlying C compiler (mostly gcc, then cl.exe, then clang, then icc). This way it can create an assembly file, too. To test that this really works and GnuCOBOL knows how to use the C compiler I've fed the resulting assembly file back to the C compiler (works fine for gcc, obviously cannot for cl). I'll adjust the test checking only if the asm file exists when cl.exe is used. – Simon Sobisch Dec 07 '16 at 14:43
  • If the question gets some upvotes and therefore stays in here (nice answer btw) I'll likely link here for the reasons... – Simon Sobisch Dec 07 '16 at 14:47
  • Still not sure I understand how that's a valuable test. It doesn't test the correctness of the output, and it has nothing to do with GnuCOBOL. It just tests the internal consistency of the C compiler. Unless you're saying that the point of GnuCOBOL is to translate COBOL code into either C or x86 assembly? What would be the point of that, as opposed to just compiling directly to object code? So you can write something in COBOL, translate it to C or assembly, and integrate it with an existing project written C/assembly? – Cody Gray - on strike Dec 07 '16 at 14:53
  • You can compile C / asm to a module/object/executable without knowing the system specifics *and* linking it to the GnuCOBOL runtime library directly with the GnuCOBOL compiler `cobc`. You can (and people do this...) call a function entry of a C / asm file from a COBOL module and combine them (`cobc` creates an object file from the C / asm source first and then links it to the generated COBOL module). The point of the current test is: `cobc` knows how to call the C compiler for doing so and `cobc` knows that s/asm is assembler. Creating it first was just for getting a "valid" asm file. – Simon Sobisch Dec 07 '16 at 15:05
  • I cannot find an "official" statment about the listing being informational-only, but this post claims the same: http://stackoverflow.com/a/7495413/524504 – masterxilo Feb 20 '17 at 19:15
1

While ASM source generated by the Microsoft C compiler might not be the best input back into MASM, that doesn't mean it won't work, at least in some cases (just maybe not complicated ones). If you look at an ASM file generated by the C compiler, you'll find someone at Microsoft went to a lot of trouble to insert various "hacky" includes, directives, manual segment definitions and other MASM specifics to give the source file at least a slim chance of being fed back into MASM and achieving an assembled result. As long as you set your expectations low, I'd guess a simple C source file, converted to ASM and then fed back into MASM should work if you get your command lines options in order.

One caveat you need to keep in mind is that if you use the CRT like you are doing (i.e. the use of memcmp), you'll want to allow the default entrypoint ___DllMainCRTStartup@12 to be selected from the appropriate CRT .LIB file rather than specifying your own. This allows the CRT to be initialized before your DllMain is called, preventing a crash when you invoke certain CRT functions that depend on this initialization. With that said, older versions of Visual Studio, such as 7.1 (2003), you could get away with not initializing the CRT depending on the which functions you used without risking a crash. The newer versions of the C Runtime will throw an exception no matter which CRT function is called if the process has not previously called mainCRTStarttup or DllMainCRTStartup.

For educational purposes, lets address the entrypoint problem you described above using MSVC 7.1 (2003) and we'll not worry about initializing the CRT so you can explicitly specify your own entrypoint. I think you were hitting the following linker warning:

warning LNK4086: entrypoint '_prog@XX' is not __stdcall with 12 bytes of arguments; image may not run

When specifying your own DLL entrypoint, the linker is expecting a DllMain signature (which is 12 argument bytes and stdcall, so the function clears the arguments itself); officially it is:

BOOL WINAPI DllMain(HINSTANCE hinstDLL, DWORD fdwReason, LPVOID lpvReserved)

You might implement the entrypoint function as shown in this version of prog.c:

#include <Windows.h>
#include <stdio.h>

#pragma warning (disable:4100) //Warning Level 4: unreferenced formal parameter

int __stdcall prog(DWORD hInst, DWORD dwReason, DWORD dwReserved)
{
    printf("Result of memcmp: %d\n",memcmp("foo","bar",3));
    return(1);
}

Assuming your LIB and INCLUDE environment variables are properly set, you can build the source above with the commands:

cl.exe /nologo /c /MD /Fa./prog.asm prog.c
link.exe /nologo /dll /subsystem:console /entry:prog prog.obj kernel32.lib

You already know you can build the original C source with the C compiler. Focusing on the prog.asm output file generated from the /Fa option, you can build the DLL from the generated ASM source as follows:

ml.exe /c /coff /Cx prog.asm
link.exe /nologo /dll /subsystem:console /entry:prog prog.obj kernel32.lib

Test out your DLL using a simple console loader such as:

#include <Windows.h>
int __cdecl main(void)
{
    HMODULE hLib = LoadLibrary("prog.dll");
    printf("LoadLibrary result: 0x%X / code=0x%X\n",hLib,GetLastError());
}

On my machine, both the C and MASM generated DLLs produced the following output:

Result of memcmp: 1
LoadLibrary result: 0x10000000 / code=0x0
Result of memcmp: 1

The generated MSVC 7.1 ASM file produced by the compiler is listed below for reference. Notice how the file refers to itself as a "Listing" :)

; Listing generated by Microsoft (R) Optimizing Compiler Version 13.10.6030 

    TITLE   prog.c
    .386P
include listing.inc
if @Version gt 510
.model FLAT
else
_TEXT   SEGMENT PARA USE32 PUBLIC 'CODE'
_TEXT   ENDS
_DATA   SEGMENT DWORD USE32 PUBLIC 'DATA'
_DATA   ENDS
CONST   SEGMENT DWORD USE32 PUBLIC 'CONST'
CONST   ENDS
_BSS    SEGMENT DWORD USE32 PUBLIC 'BSS'
_BSS    ENDS
$$SYMBOLS   SEGMENT BYTE USE32 'DEBSYM'
$$SYMBOLS   ENDS
_TLS    SEGMENT DWORD USE32 PUBLIC 'TLS'
_TLS    ENDS
FLAT    GROUP _DATA, CONST, _BSS
    ASSUME  CS: FLAT, DS: FLAT, SS: FLAT
endif

INCLUDELIB MSVCRT
INCLUDELIB OLDNAMES

_DATA   SEGMENT
$SG74617 DB 'bar', 00H
$SG74618 DB 'foo', 00H
$SG74619 DB 'Result of memcmp: %d', 0aH, 00H
_DATA   ENDS
PUBLIC  _prog@12
EXTRN   __imp__printf:NEAR
EXTRN   _memcmp:NEAR
; Function compile flags: /Odt
_TEXT   SEGMENT
_hInst$ = 8                     ; size = 4
_dwReason$ = 12                     ; size = 4
_dwReserved$ = 16                   ; size = 4
_prog@12 PROC NEAR
; File prog.c
; Line 10
    push    ebp
    mov ebp, esp
; Line 11
    push    3
    push    OFFSET FLAT:$SG74617
    push    OFFSET FLAT:$SG74618
    call    _memcmp
    add esp, 12                 ; 0000000cH
    push    eax
    push    OFFSET FLAT:$SG74619
    call    DWORD PTR __imp__printf
    add esp, 8
; Line 12
    mov eax, 1
; Line 13
    pop ebp
    ret 12                  ; 0000000cH
_prog@12 ENDP
_TEXT   ENDS
END
byteptr
  • 1,275
  • 11
  • 15
  • Much useful information. Is there a way of not manually adding `BOOL WINAPI DllMain(HINSTANCE hinstDLL, DWORD fdwReason, LPVOID lpvReserved)` and still get the dll to work? This isn't done when compiling directly either. – Simon Sobisch Dec 08 '16 at 09:12
  • The Windows loader will treat any entrypoint for a DLL as if it has a DllMain signature, which accepts 3 DWORD arguments and the stack is adjusted so that they are removed upon returning. You don't have to use them or reference them, but they'll be on the stack regardless. At the assembly level, the only thing your entrypoint function must do before returning is clear the 3 DWORDs from the stack, which in this case is illustrated by the "RET 12" at the end. If you don't do this, the process will likely crash. – byteptr Dec 08 '16 at 23:16
  • 1
    I almost forgot to mention that if you return zero from the DLL entrypoint function (value in EAX before the RET instruction) when it is first called (i.e. DLL_PROCESS_ATTACH == dwReason), you are telling the system that initialization failed and the DLL will be unloaded. You should always return 1. Keep in mind Windows will call your entrypoint multiple times with different values for dwReason. Also depending on the APIs you call in this entrypoint may result in a deadlock. Have a look here: https://msdn.microsoft.com/en-us/library/windows/desktop/ms682583%28v=vs.85%29.aspx – byteptr Dec 08 '16 at 23:38
  • Regarding the above commands from the byteptr post: ml.exe /c /coff /Cx prog.asm, and link.exe /nologo /dll /subsystem:console /entry:prog prog.obj kernel32.lib. I added two additional lib files which were 'ucrt.lib' and 'vcruntime.lib'. The kernel32.lib is not enough for the linker, link.exe, to resolve all the externals in the prog.obj file. – tom Apr 30 '23 at 05:30