3

I have this block of code here on Intel structure and I wonder why uhex$(ebx)+6 ? What is the +6 in there is for ? Can some one explain it to me ?

.code 
start: 
    ...

WinMain proc hInst:HINSTANCE,hPrevInst:HINSTANCE,CmdLine:LPSTR,CmdShow:DWORD 
    ...
WinMain endp

WndProc proc hWnd:HWND, uMsg:UINT, wParam:WPARAM, lParam:LPARAM 
    ...
            .ELSEIF  ax==IDM_GETTEXT 
                ; store user input into buffer
                invoke GetWindowText,hwndEdit,ADDR buffer,512 

                ; convert into hex
                xor eax,eax
                invoke lstrlen, addr buffer
                mov mSize,eax
                mov edx,mSize
                mov esi, OFFSET buffer

                .WHILE i != edx
                    push edx
                    xor ebx,ebx
                    mov bl, [esi]
                    add esi,1
                    mov Value,uhex$(ebx) + 6
                    invoke lstrcat, addr mStr, Value
                    inc i
                    pop edx
               .ENDW

                ...
WndProc endp 
end start

I've tried searching on Google and ChatGPT but all I found was uhex convert 32 bit integer to hex string, I just need an explaination of uhex$ and why +6 ?

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Huy Tran
  • 31
  • 3
  • Does that source file define `uhex` as a macro or constant? The destination is a memory operand (a variable with symbol name `Value`) so the source must be an immediate. (I'm assuming that uhex stuff doesn't expand to a register name, because there's no way `+6` makes sense when talking about a register operand.) It can't also be memory because x86 doesn't allow two explicit memory operands for an instruction. – Peter Cordes Aug 28 '23 at 04:45
  • 2
    Oh, according to https://redirect.cs.umbc.edu/courses/undergraduate/313/fall07/burt/CMSC313_lectures/Text_IO/text_IO.html, it might assemble to a function call? That's insane, so not one machine instruction at all, like maybe an `invoke` and then `mov Value, eax`? A 32-bit hex string has 8 hex digits, but this loop only loads 2 bytes at a time. So `+6` would be taking the last 2 bytes of that string. This seems super hacky and obfuscated. Have you tried assembling it and looking at the disassembly to see if it's copying from mem to mem with multiple instructions? x86 can't do `mov mem,mem` – Peter Cordes Aug 28 '23 at 04:52
  • It seems to be doing a hexdump of a memory region, 1 byte at a time. – Peter Cordes Aug 28 '23 at 04:52
  • 1
    @PeterCordes I don't see any define `uhex` as macro or constant. As I'm debugging that codes, I found that after converting to a hex string the output has 8 characters. For example: "a" -> "00000061". And the `+6` is for deleting that 6 "0" in front of that output. But how does that make any sense ? – Huy Tran Aug 28 '23 at 04:53
  • @PeterCordes I don't really know how the `mov mem, mem` works but i'll search more about it... But why `+6`... are there anything to read about that "super hacky and obfuscated" things ? I've tried assembling it and it work completely fine (to me) – Huy Tran Aug 28 '23 at 04:59
  • 1
    Its not really mem, mem - it is mem, imm32 where imm32 happens to be the address of the hex string created by `uhex` – Michael Petch Aug 28 '23 at 05:53
  • 3
    `mov mem,mem` does *not* work in x86. Oh, so it's an address, despite not using the `OFFSET` keyword. Into some static buffer somewhere written by the call to `uhex`, I guess. But that makes sense, now that I look and see it's passing it as an arg to `lstrcat`. (So this is just hilariously inefficient; not only does it do 8 hex digits and only take 2 of them every iteration, it also has to scan to find the end of the string to append, which is O(n^2). Totally defeating the purpose of writing asm by hand. If you want to write bad code like this, pick a higher-level language.) – Peter Cordes Aug 28 '23 at 06:22
  • 2
    This `uhex` thing seems to be a MASM32 macro or extension that expands to extra instructions, and replaces the original instruction with an immediate I guess. That's what I mean by "super hacky", since most x86 assemblers don't have pseudo-instructions (where one instruction expands to multiple), and this is triggered by an operand, not a mnemonic. (Although to be fair, ARM's `ldr r0, =12345` is also triggered by operand syntax. But it doesn't make a function call that modifies a static buffer somewhere. That's widely seen as obsolete API design since it's not thread safe.) – Peter Cordes Aug 28 '23 at 06:25
  • Oh.. kay that helped alot... Now I understand the problems. Thank you ! – Huy Tran Aug 28 '23 at 06:42
  • 1
    @PeterCordes : although `uhex$` uses a static buffer it is a unique buffer created for every invocation of the `uhex$` macro, thus the reason that the macro declared `rvstring` a `LOCAL` label to the macro. That will generate a unique label in place of `rvstring`. The macro places the buffer associated with that label in the data section and then switches back to the code section. – Michael Petch Aug 28 '23 at 15:28
  • For the record, [How to convert a binary integer number to a hex string?](https://stackoverflow.com/q/53823756) shows how to convert to hex in 4-byte chunks, or in 8 or 16-byte chunks with SIMD. It should be obvious to loop over input / output buffers doing the conversion, maybe just loading one byte at a time and splitting it to two nibbles, basically "unrolling" a 2-nibble inner loop. – Peter Cordes Aug 28 '23 at 19:52
  • 1
    The code you found is just so inefficient in so many ways, like using push/pop for an EDX loop counter. It could have used call-preserved EBX for the loop counter, and EDX or EAX for the temporary to load into, then it wouldn't have to push/pop anything inside the loop. And if you want a zero-extending byte load, use `movzx edx, byte ptr [esi]`, not xor / `mov dl, [esi]`. – Peter Cordes Aug 28 '23 at 19:55

1 Answers1

3

From masm32 folder D:\masm32\macros\macros.asm


      uhex$ MACRO DDvalue   ;; unsigned DWORD to hex string
        LOCAL rvstring
        .data
          rvstring db 12 dup (0)
        align 4
        .code
        invoke dw2hex,DDvalue,ADDR rvstring
        EXITM <OFFSET rvstring>
      ENDM
Nassau
  • 377
  • 1
  • 1
  • 8