0

I wrote the following program in nasm on a Windows 10 64bit machine

extern _GetStdHandle@4
extern _WriteFile@20
extern _ReadFile@20
extern _ExitProcess@4

;ReadBufferSize
%define rbs 100

section .rodata
    w: dd 3
    h: dd 3
    text: db "abc", 10

section .data
    x: dd 4
    y: dd 4

section .bss
    stdout: resb 4
    stdin: resb 4
    readbuff: resb (rbs+2); allow for CR, LF
    bytecnt: resb 4

section .text
    _print:
        push ebp
        mov ebp, esp

        push 0
        lea eax, [ebp-4]
        push eax
        mov eax, [ebp+8]
        push dword [eax]
        push dword [ebp+12]
        push dword [stdout]
        call _WriteFile@20

        mov esp, ebp
        pop ebp
        ret 8

    _read:
        push ebp
        mov ebp, esp

        push 0
        push dword [ebp+8]
        push rbs
        push dword [ebp+12]
        push dword [stdin]
        call _ReadFile@20
        sub dword [bytecnt], 2; remove CR and LF

        mov esp, ebp
        pop ebp
        ret 8
    _draw:
        push ebp
        mov ebp, esp
        
        push dword [w]
        L1:
            push dword [h]
            L2:
                push text
                push x
                call _print
            dec dword [ebp-8]
            cmp dword [ebp-8], 0
            jg L2
        dec dword [ebp-4]
        cmp dword [ebp-4], 0
        jg L1

        mov esp, ebp
        pop ebp
        ret

    global _main
    _main:
        push -11
        call _GetStdHandle@4 ;Get Stdout handle
        mov [stdout], eax

        push -10
        call _GetStdHandle@4 ;Get Stdin handle
        mov [stdin], eax
        
        call _draw

        push 0
        call _ExitProcess@4

        hlt

I would have cleaned it up, but this may affect something, so I won't (sorry). The _draw function should output "abc\n" 9 (3*3) (3 and 3, because w and h are 3) times on the screen, but it only gets outputed 5 times. As far as I have checked, the _print works as intended. Here's my build script if it helps:

@echo off
IF "%~2" EQU "" set f0rm4t=win32
IF "%~2" NEQ "" set f0rm4t=%~2
if exist %1.exe rm %1.exe
if exist %1.obj rm %1.obj
nasm -f %f0rm4t% %1.asm
if exist %1.exe goto run
if not exist %1.obj goto end
gcc %1.obj -o %1.exe
if %errorlevel% EQU 0 goto run
:run
%1.exe
echo ------------------------
if %errorlevel% EQU -1073741819 (
    echo exited with C0000005
    goto end
)
if %errorlevel% EQU 0 (
    echo exited successfully
    goto end
)
echo exited with return value %errorlevel%
:end

I build using the build filename command, and it functions correctly. I have checked my logic, and it seems fine, so it's something with the nested loop and I don't know what exactly. Complete build command output:

C:\Users\User\files>build filename
abc
abc
abc
abc
abc
------------------------
exited successfully

C:\Users\User\files>

If the loops were working properly, then it would be impossible to get the output 5 times, because it should be (repetitions of the outer one)*(repetitions of the inner one), but 5 is a prime number. I can not figure this out, so I leave it to this community. Have a great day and thanks in advance!

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Gigo_G
  • 123
  • 1
  • 3
  • Your `_draw` function pushes but never pops, so the stack is full of garbage instead of the loop control variables you're imagining. – Erik Eidt Dec 18 '21 at 23:27
  • 2
    @ErikEidt: well, the loop control variables are referenced through `ebp`, so those are not effected by the pushes, but you're right that the pushes of `w` and `h` do not seem to serve any useful purpose here. – 500 - Internal Server Error Dec 18 '21 at 23:44
  • @500-InternalServerError , `w` and `h` are constants declared in `section .rodata`, and I do not want to use them for loop control variables – Gigo_G Dec 19 '21 at 11:47

2 Answers2

3

The first time through the outer loop, [ebp-8] counts down to zero. The next two times through the outer loop, [ebp-8] is already zero (or less), so the inner loop only executes once. For a total of five calls to print.

Note that the subsequent executions of push [h] don't change [ebp-8]. On the second iteration, it writes to [ebp-12] and on the third iteration it writes to [ebp-16], because the stack pointer continues to grow downward with each push.

To fix this, I suggest:

  1. Add sub esp, 8 right after mov ebp, esp.
  2. Change push [w] to mov eax, [w]; mov [ebp-4], eax.
  3. Change push [h] to mov eax, [h]; mov [ebp-8], eax.
prl
  • 11,716
  • 2
  • 13
  • 31
  • 2
    Note that the fix suggested here is based on minimizing changes to your program just to fix the problem. There are other changes I would suggest: Keep the loop counters in registers. Dec sets flags, so you can use jg or jnz immediately after dec without a cmp instruction. – prl Dec 19 '21 at 00:52
2

After looking at the beginning of @prl 's answer, I realised that, like they said, [ebp-8] gets decremented to 0 in the first iteration of the inner loop and the next push [h] pushes to [ebp-12], so [ebp-8] remains at 0. So the first thing I tought of was just adding a pop edx (using edx as a junk register) after the jumps, like this:

    _draw:
        push ebp
        mov ebp, esp
        
        push dword [w]
        L1:
            push dword [h]
            L2:
                push text
                push x
                call _print
            dec dword [ebp-8]
            cmp dword [ebp-8], 0
            jg L2
            pop edx
        dec dword [ebp-4]
        cmp dword [ebp-4], 0
        jg L1
        pop edx

        mov esp, ebp
        pop ebp
        ret

this would clear the already zeroed variables of the stack and make room for the new ones, and it works.

But I think @prl 's answer is more applicable in general for a technique for allocating variables on the stack, so if you can, use it, as it is alslo less confusing.

Gigo_G
  • 123
  • 1
  • 3
  • See also [What C/C++ compiler can use push pop instructions for creating local variables, instead of just increasing esp once?](https://stackoverflow.com/q/49485395) - it's slightly less efficient to use repeated push/pop than `mov edx, [h]` / `mov [ebp-8], edx`. `push mem` decodes to 2 fused-domain uops on modern CPUs like Skylake and Zen (same as separate load and store insns), and using it means you need to balance it with `pop`. This is actually good for code-size, though. Of course, still makes more sense to save/restore e.g. EBX and EDI for use as counters, and `dec ebx / jg L2` – Peter Cordes Dec 19 '21 at 12:00