0

I'm trying to implement an injection of my 64-bit DLL into a 64-bit process. My host process calls CreateRemoteThread with a thread subroutine pointing to LoadLibrary. The DLL later unloads itself "from within" by calling FreeLibraryAndExitThread.

My goal is to know if the injected LoadLibrary call succeeded. For that unfortunately I can't use GetExitCodeThread from within my (host) process since returned 64-bit HMODULE handle is truncated to by a remote thread to DWORD. And I don't want to use Tool Help APIs as they will introduce a race condition.

So thus I was wondering about the lower 32-bits of the HMODULE returned by LoadLibrary in a 64-bit process -- can I reliably assume that its lower 32-bits will not be 0's for a valid handle?

PS. I don't need the HMODULE handle itself, all I need to know if LoadLibrary succeeded or not.

Edit. The call from my host process is done as such (in a very concise pseudo-code -- no error checking):

CreateRemoteThread(hProcess, 0, 0, 
  GetProcAddress(GetModuleHandle(L"kernel32.dll"), "LoadLibraryW"),
  pVmAddressOfMyDllsPathWrittenWith_WriteProcessMemory, 0, 0);
MikeF
  • 1,021
  • 9
  • 29
  • *"`CreateRemoteThread` with a thread subroutine pointing to `LoadLibrary`"* - Great. Now you have two problems. Had you opted to implement a real solution (i.e. use a real thread procedure), you wouldn't have to solve a problem for which there is no solution. – IInspectable Oct 31 '17 at 08:40
  • It used to be a true handle back in the 16-bit days. Not anymore, it is now simply the base address of the load module in memory. You cannot make hard assumptions about its value with ASLR around. LoadLibrary() returns BOOL, never ignore it. – Hans Passant Oct 31 '17 at 08:47
  • @IInspectable: Well, I can't use "real" thread procedure since I can't resolve WinAPI calls and static string offsets in its code to inject it into a remote process. – MikeF Oct 31 '17 at 08:48
  • @HansPassant: If it was `BOOL` it'd be easy to check :) – MikeF Oct 31 '17 at 08:49
  • before inject to 64bit process from wow you need first enter to 64bit mode in your own process. only from here – RbMm Oct 31 '17 at 08:50
  • `LoadLibrary` is 64bit address. you simply can not it use in 32 bit api `CreateRemoteThread` – RbMm Oct 31 '17 at 08:53
  • That doesn't make sense. If you can pass `LoadLibrary`'s address as the thread procedure, then surely you can inject code that calls this API (for which the address is apparently known), and turn the `HMODULE` into a boolean `DWORD` value on return. Unless you are not disclosing the entire story, there is nothing that would prevent you doing it the right way (for some definition of *"right"* when dealing with code injection anyway). – IInspectable Oct 31 '17 at 08:54
  • @IInspectable: In a very concise pseudo-code I do this: `CreateRemoteThread(hProcess, 0, 0, GetProcAddress(GetModuleHandle(L"kernel32.dll"), "LoadLibraryW"), pVmAddressOfMyDllsPathWrittenWith_WriteProcessMemory, 0, 0);` and thus I cannot convert returned `HMODULE` to `BOOL` – MikeF Oct 31 '17 at 08:57
  • @MikeF - your pseudo-code is complete wrong for wow64 process – RbMm Oct 31 '17 at 08:59
  • @RbMm: Hmm. Can you explain in a separate answer? – MikeF Oct 31 '17 at 09:00
  • @MikeF - but i already say you this several time - address of **64bit** `LoadLibraryW` is 64bit address. you can not pass it to 32bit api `CreateRemoteThread` from your wow64 process. injection from wow64 to 64 bit process is possible (i yourself do this) but hard and require big knowledge. you need first enter to 64bit gate in your process and only from 64bit shell in your process you can do injection – RbMm Oct 31 '17 at 09:03
  • @RbMm: It is not a WOW64 process. Everything is 64-bit. Please refresh the page. I rephrased my original question. (I guess SO doesn't update the page automatically for you guys.) – MikeF Oct 31 '17 at 09:05
  • I understand what you are doing. I do not understand, why you apparently cannot do it the right way. You already injected data into the remote process. Why do you believe that you cannot inject code that calls `LoadLibrary` and returns a boolean value from a *real* thread procedure. – IInspectable Oct 31 '17 at 09:07
  • @IInspectable: Sure. I'm willing to learn. Can you show me how to do it? – MikeF Oct 31 '17 at 09:08
  • @MikeF - understand - in this case you need inject code which point not to `LoadLibrary` address but to tiny shell code which first call `LoadLibrary` and then (if it fail) call `GetlastError` and return error code from thread – RbMm Oct 31 '17 at 09:09
  • @RbMm: Sure, but how do I get that "tiny shell code" into the running remote process? – MikeF Oct 31 '17 at 09:10
  • @MikeF - by write process memory of course :) – RbMm Oct 31 '17 at 09:11
  • @RbMm: OK, I can write machine code for the most part & can probably use relative `lea` instructions to reference strings. (Obviously I will have to forget about C++ or even C. In that case it's just raw assembly.) But still how do I resolve WinAPI offsets for the `call` instructions? – MikeF Oct 31 '17 at 09:14
  • @MikeF - but how you resolve `LoadLibraryW` address ? you need simply save 2 address - `LoadLibrary` and `GetLastError` in shell code body and use it. really task is not hard. i many time implement this – RbMm Oct 31 '17 at 09:16
  • @RbMm: Hmm. Yeah, good idea, dude! I didn't realize that `GetLastError` is also a kernel32.dll function so I can get its address from the host process. Yep, good suggestion. It may work. It's just such a pain in the rear to write assembly code in a 64-bit project in Visual Studio. Anyway, will try tomorrow. Thanks guys for all your help! – MikeF Oct 31 '17 at 09:19
  • Sorry, it still doesn't let me upvote anything. It sucks to be a noob :) – MikeF Oct 31 '17 at 09:20

2 Answers2

3

Can I reliably assume that its lower 32-bits will not be 0's for a valid handle?

No you cannot. An HMODULE is just the same in 64 bit as it is in 32 bit. It is the base address of the loaded module. So there is no reason why a valid HMODULE would have to have non-zero low order bits.

It's very simple for you to confirm this. Create a 64 bit DLL with an IMAGEBASE set to, for instance, 0x0000000100000000. Load that DLL, and inspect the value of the returned HMODULE.

David Heffernan
  • 601,492
  • 42
  • 1,072
  • 1,490
  • Yes, you are correct. Sorry, I misspoke. (I'll correct my question.) Of course the host process is 64-bit. Still, the issue remains. How to know that injected LoadLibrary succeeded? – MikeF Oct 31 '17 at 08:45
  • Not entirely moot. Even when using a 64-bit host process, a thread's return value is 32 bits wide, so there is no way to use it to return a pointer-sized value. – IInspectable Oct 31 '17 at 08:46
  • @MikeF - are you use `LoadLibrary` address from *x64* *kernel32.dll* ? this address can be (and always for example on win10) is 64bit address with non-zero high part. so you simply can not use it in call `CreateRemoteThread`. only one way - you need first enter to 64bit mode from your process and from here call 64bit ntdll api – RbMm Oct 31 '17 at 08:48
  • @RbMm: Correct. There's no `WOW64` involved. I misspoke in my original question. (Now corrected.) All processes and DLL are 64-bit. And yes, I load `LoadLibrary` address from the 64-bit process and thus I'm assuming it takes it from the x64 kernel.dll – MikeF Oct 31 '17 at 08:53
  • @MikeF - in which api call you use 64-bit address of `LoadLibrary` ? are you load 64-bit kernel32.dll to your process ? – RbMm Oct 31 '17 at 08:55
  • OK, so when you said, *I'm trying to implement an injection of my 64-bit DLL from a **32-bit process** into a 64-bit process. My **32-bit process** calls CreateRemoteThread with a thread subroutine pointing to LoadLibrary.* you meant to write *I'm trying to implement an injection of my 64-bit DLL from a **64-bit process** into a 64-bit process. My **64-bit process** calls CreateRemoteThread with a thread subroutine pointing to LoadLibrary.* – David Heffernan Oct 31 '17 at 08:58
  • @DavidHeffernan: True, I got confused. Please refresh the page. I updated it. The 64-bit DLL is injected from and to 64-bit process only. (It's too late where I'm in. Sorry for the confusion, guys!) – MikeF Oct 31 '17 at 09:03
  • Just as a quick follow-up. [There's evidently a hack](http://blog.rewolf.pl/blog/?p=102) that can allow to run mixed (32-bit and 64-bit) code in one process. I'm not sure if it still works though. – MikeF Nov 01 '17 at 01:58
2

instead CreateRemoteThread with a thread subroutine pointing to LoadLibraryW we can inject tiny shell code to remote process which first call LoadLibraryW and than, if it fail, GetLastError - as result remote thread return error code (0 if no error) - and you will be know exactly - are LoadLibrary ok and if not - have error code. the 64 asm code can be:

CONST segment

SHELLDATA struct
    LoadLibrary DQ ?
    GetLastError DQ ?
SHELLDATA ends

public RemoteThreadProc_begin
public RemoteThreadProc_end

RemoteThreadProc_begin:
RemoteThreadProc proc
    nop
    nop
    nop
    call @@0
    ___ SHELLDATA <>
@@0:
    xchg [rsp],rbp
    sub rsp,20h
    call SHELLDATA.LoadLibrary[rbp]
    test rax,rax
    jz @@1
    xor eax,eax
@@2:
    add rsp,20h
    pop rbp
    ret
@@1:
    call SHELLDATA.GetLastError[rbp]
    jmp @@2
RemoteThreadProc endp
RemoteThreadProc_end:

CONST ends

and c++ code:

extern "C"
{
    extern UCHAR RemoteThreadProc_begin[], RemoteThreadProc_end[];
}

enum INJECT_PHASE {
    fOpenProcess, fVirtualAlloc, fWriteProcessMemory, fCreateRemoteThread, fMax
};

ULONG injectDll(ULONG dwprocessId, PCWSTR dllFilePath, INJECT_PHASE& phase)
{
    ULONG err = 0;

    struct SHELLDATA 
    {
        __int64 code;
        PVOID LoadLibrary, GetLastError;
    };

    if (HANDLE hProcess = OpenProcess(PROCESS_CREATE_THREAD|PROCESS_VM_OPERATION|PROCESS_VM_WRITE, FALSE, dwprocessId))
    {
        SIZE_T cbStr = (wcslen(dllFilePath) + 1) * sizeof(WCHAR);
        SIZE_T cbCode = ((RemoteThreadProc_end - RemoteThreadProc_begin) + sizeof(WCHAR) - 1) & ~(sizeof(WCHAR) - 1);

        union {
            PVOID RemoteAddress;
            PBYTE pbRemote;
            PTHREAD_START_ROUTINE lpStartAddress;
        };

        if (RemoteAddress = VirtualAllocEx(hProcess, 0, cbStr + cbCode, MEM_COMMIT, PAGE_EXECUTE_READWRITE))
        {
            union {
                PVOID pv;
                PBYTE pb;
                SHELLDATA* ps;
            };

            pv = alloca(cbStr + cbCode);

            memcpy(pv, RemoteThreadProc_begin, cbCode);
            memcpy(pb + cbCode, dllFilePath, cbStr);

            HMODULE hmod = GetModuleHandle(L"kernel32");
            ps->GetLastError = GetProcAddress(hmod, "GetLastError");
            ps->LoadLibrary = GetProcAddress(hmod, "LoadLibraryW");

            if (WriteProcessMemory(hProcess, RemoteAddress, pv, cbStr + cbCode, 0))
            {
                if (HANDLE hThread = CreateRemoteThread(hProcess, 0, 0, lpStartAddress, pbRemote + cbCode, 0, 0))
                {
                    phase = fMax;
                    WaitForSingleObject(hThread, INFINITE);
                    GetExitCodeThread(hThread, &err);

                    CloseHandle(hThread);
                }
                else
                {
                    phase = fCreateRemoteThread;
                    err = GetLastError();
                }
            }
            else
            {
                phase = fWriteProcessMemory;
                err = GetLastError();
            }

            VirtualFreeEx(hProcess, RemoteAddress, 0, MEM_RELEASE);
        }
        else
        {
            phase = fVirtualAlloc;
            err = GetLastError();
        }

        CloseHandle(hProcess);
    }
    else
    {
        phase = fOpenProcess;
        err = GetLastError();
    }

    return err;
}
RbMm
  • 31,280
  • 3
  • 35
  • 56
  • This might solve the problem behind the question, but it doesn't actually answer the question. – David Heffernan Oct 31 '17 at 10:01
  • @DavidHeffernan - yes of course. the correct answer to formal question you already give – RbMm Oct 31 '17 at 10:02
  • @DavidHeffernan: Yes, I marked your post as the answer. Still I appreciate RbMm's take on it & the code sample. (Too bad I can't upvote anything though.) – MikeF Oct 31 '17 at 17:05
  • @RbMm: Question about your asm. Are we supposed to save context for a thread function? I'm talking about rbx, rbp, rdi, rsi, rsp & r12 through r15 registers. – MikeF Oct 31 '17 at 17:07
  • @MikeF - of course any function must save/restore non-volatile registers. in case thread entry point this is possible not mandatory, because just after it return `ExitThread` is called. but anyway better save. and this is not hard – RbMm Oct 31 '17 at 17:11
  • Also shouldn't the path to a dll for the `LoadLibrary` call be in the `rcx` register as the 1st parameter? – MikeF Oct 31 '17 at 17:13
  • @MikeF - yes, of course. in x64 first argument always in *rcx* – RbMm Oct 31 '17 at 17:14
  • @MikeF -*rcx* we already set in `CreateRemoteThread` - this is `pbRemote + cbCode` exactly point to library name – RbMm Oct 31 '17 at 17:16
  • @RbMm: Oh, I see, you're re-using the 1st entry parameter into the thread function itself, which is already in `rcx`. Hah. Very clever, dude! – MikeF Oct 31 '17 at 17:51
  • @MikeF - yes, *rcx* already ready. *rbp* - i initialize to `SHELLDATA` and save by `xchg [rsp],rbp` – RbMm Oct 31 '17 at 17:57
  • @RbMm: OK. My idea was to use `lea rcx, [relative_offset]` instruction and add DLL file path right after the code segment. But passing it as a parameter to `CreateRemoteThread` should work as well I guess. – MikeF Oct 31 '17 at 18:00
  • @MikeF - yes, the pointer to *filename* we pass via `CreateRemoteThread`. pointer to `SHELLDATA` where pointers to `LoadLibrary` and `GetLastError` we got via `call @@0 .. xchg [rsp],rbp` - this is well known technique foor base independed code – RbMm Oct 31 '17 at 18:07
  • @RbMm: Yeah I see how you got the address of `SHELLDATA` by using that `call @@0` instruction. Can I ask though, why are you adding three `nop`s at the beginning of `RemoteThreadProc`? – MikeF Oct 31 '17 at 18:14
  • @MikeF - for correct align. the `SHELLDATA` must be aligned on pointer size - so on 8. bytes; `call @@0` is 5 byte length. so need add 3 bytes for align. – RbMm Oct 31 '17 at 18:19
  • @RbMm: Oh, so instead of doing `xchg [rsp],rbp` and then `add rbp, 3` you added 3 `nop`s. OK, should work as well. I didn't know about that alignment thing though. Is it something that's needed because you're mixing data and code segments? – MikeF Oct 31 '17 at 18:25
  • @MikeF - no, the assembler never special repack data. it work like we use `#pragma pack(1)`. i simply want that pointers will be correct aligned in memory - on 8 bytes – RbMm Oct 31 '17 at 18:28
  • @RbMm: Well, thanks for your help! One thing I would add to your asm code though is a check if `GetLastError` returns 0. Microsoft is known for "eating up" error codes. Something akin to: `call SHELLDATA.GetLastError[rbp]`, `test eax, eax`, `jnz @@2`, `neg eax`, `jmp @@2`. Just as a precaution. So if you get -1 then you'll know what happened. – MikeF Oct 31 '17 at 18:51
  • @MikeF - if `LoadLibrary` fail - `GetLastError` never will be 0. this is not gui call. – RbMm Oct 31 '17 at 18:54
  • @RbMm: Well, you have more confidence in Microsoft's coding skills than I do :) – MikeF Oct 31 '17 at 18:55