printing numbers in nasm

Question

I have written an assembly code to print numbers from 1 to 9 but the code only prints 1 and no other element other than 1 is printed and only one output is received.It means that the loop is also not being run. I cant figure out what is wrong with my code.

section .bss

        lena equ 1024
        outbuff resb lena

section .data

section .text

        global _start
        _start:
                nop
                mov cx,0

                incre:
                inc cx
                add cx,30h
                mov [outbuff],cx

                cmp cx,39h
                jg done

                cmp cx,39h
                jl print


                print:
                mov rax,1           ;sys_write
                mov rdi,1
                mov rsi,outbuff
                mov rdx,lena
                syscall
                jmp incre

                done:
                mov rax,60          ;sys_exit
                mov rdi,0
                syscall

My OS is 64 bit linux. this code is built using nasm with the following commands : nasm -f elf64 -g -o num.o num.asm and ld -o num num.asm

You can take this info here https://stackoverflow.com/questions/31143237/printing-a-number-in-assembly-nasm-using-printf — , Oct 01 '17 at 10:29
learn to use debugger (`gdb` should be available, then you may try some GUI extension over it, or I personally prefer `edb-debugger`, but you will have to build it from sources, may be tricky if you are not used to that). — Ped7g, Oct 01 '17 at 11:51
You should not change the code in the question after receiving an answer. That makes the problem invisible and invalidates the answer! — Cody Gray - on strike, Oct 01 '17 at 17:30

Shachar Shemesh · Accepted Answer · 2017-10-02T06:20:36.427

0

Answer rewritten after some experimentation.

There two errors in your code, and a few inefficiencies.

First, you add 0x30 to the number (to turn it from the number 1 to the ASCII 1). However, you do that increment inside the loop. As a result, your first iteration cx is 0x31, second 0x62 ("b"), third 0x93 (invalid UTf-8 sequence) etc.

Just initialize cx to 0x30 and remove the add from inside the loop.

But there's another problem. RCX is clobbered during system calls. Replacing cx with r12 causes the program to work.

In addition to that, you pass the buffer's length to write, but it only has one character. The program so far:

section .bss

        lena equ 1024
        outbuff resb lena

section .data

section .text

        global _start
        _start:
                nop
                mov r12,30h

                incre:
                inc r12
                mov [outbuff],r12

                cmp r12,39h
                jg done

                cmp r12,39h
                jl print


                print:
                mov rax,1           ;sys_write
                mov rdi,1
                mov rsi,outbuff
                mov rdx,1
                syscall
                jmp incre

                done:
                mov rax,60          ;sys_exit
                mov rdi,0
                syscall

Except even now, the code is extremely inefficient. You have two compares on the same condition, one of them branches to the very next instruction.

Also, your code would be much much much faster and smaller if you moved the breaking condition to the end of the code. Also, cx is a 16 bit register. r12 is a 64 bit register. We actually only need 8 bits. Using larger registers than needed means all of our immediates waste up space in memory and the cache. We therefor switch to the 8 bit variant of r12. After these changes, we get:

section .bss

        lena equ 1024
        outbuff resb lena

section .data

section .text

        global _start
        _start:
                nop
                mov r12b,30h

                incre:
                inc r12b
                mov [outbuff],r12b

                mov rax,1           ;sys_write
                mov rdi,1
                mov rsi,outbuff
                mov rdx,1
                syscall

                cmp r12b,39h
                jl incre

                mov rax,60          ;sys_exit
                mov rdi,0
                syscall

There's still lots more you can do. For example, you call the write system call 9 times, instead of filling the buffer and then calling it once (despite the fact that you've allocated a 1024 bytes buffer). It will probably be faster to initialize r12 with zero (xor r12, r12) and then add 0x30. (not relevant for the 8 bit version of the register).

edited Oct 02 '17 at 06:20

answered Oct 01 '17 at 10:46

Shachar Shemesh

8,193
6
25
57

I have removed add cx,30h from inside the loop but the output is still the same – Sourav Rai Oct 01 '17 at 13:02
Thanks @shachar I debugged the program and found that after the sycall in the print label the value of rcx changed to some other hex number thats why even after putting add cx,0x30 outside the loop the output was only one. – Sourav Rai Oct 01 '17 at 13:30
I've amended my answer to include the clobbering of rcx. I've also pointed a few inefficiencies in your implementation. Good luck. – Shachar Shemesh Oct 01 '17 at 13:35
1

I don't think using whole 64b register for single char is very accurate, overwriting 8 bytes of memory every time. How about `r12w` to simulate original 16bit, or probably even better `r12b` as the output will use only single byte any way? – Ped7g Oct 01 '17 at 14:06
@Ped7g thanks. I hate the Intel mess called "assembly" and was too lazy to look up how to do that. I also dimly remembered that it was not possible for the new 64bit registers (I think what's not possible is to use both halves of the lower 8 bits). Anyways, I've incorporated that into my answer. Thank you again. – Shachar Shemesh Oct 02 '17 at 06:22
@ShacharShemesh yes, there's no way to use b8-b15 in the way how original `ah` of `ax` works. Actually the `ah` itself is not accessible for many instructions in 64b mode, as that encoding is now used for the new registers like `r12` IIRC. I quite like x86 assembly, it's a bit hairy and contains too much legacy, but it's quite OK to write some asm by hand for it. On contrary while writing some examples with MIPS or ARM I feel lot more constrained and have to add more mundane code around for housekeeping, while x86 asm often gets more to the point. (doesn't apply to compilers of course, ARM=ok) – Ped7g Oct 02 '17 at 06:26
1

@Ped7g so you basically agree with me it's a mess, you just like your stuff messy :-) – Shachar Shemesh Oct 02 '17 at 06:31
yeah. Thinking about it... as I prefer C++ over Java for example... I don't mind messy tools, as long as they allow me to produce reasonably clean product (source). While presumably shining tools (Java) leading to horrible corporate-grade monstrosities don't amuse me so much, state of tool is important for me, but not as much as results. :) But looks like I'm heavily biased and subjective (you know, once you invest some decade or two of life into something...). :) – Ped7g Oct 02 '17 at 06:36
I, too, prefer C++ to Java, but that has nothing to do with messy. When you write in ARM assembly, the CPU runs what you wrote. When you write for Intel, the CPU translates your asm to some unrelated language and runs that. Writing efficient code means guessing what the CPU will do and planning for it. In other words: Intel is the Java of assemblies. – Shachar Shemesh Oct 02 '17 at 06:39
Intel maps most x86 instructions to a single internal uop. It's hardly "unrelated". Other than cmp/jcc macro-fusion, it doesn't get "optimized" at run-time across instruction boundaries. If you want a CPU that's like Java, look at Transmeta's Crusoe [which dynamically recompiles x86 to an internal VLIW instruction set](http://archive.arstechnica.com/cpu/1q00/crusoe/m-crusoe-1.html) a lot like a JIT VM. Intel CPU pipeline are fairly well understood; http://agner.org/optimize/ – Peter Cordes Oct 02 '17 at 15:50
1

@Ped7g: `ah/bh/ch/dh` aren't encodeable with a REX prefix because a REX prefix changes the meaning of that encoding to `dil/sil/bpl/spl`. This makes the ISA more regular, so you can always store or `movzx` the low byte of any register, making it a better compiler target (although a good compiler would try to allocate registers to minimize the amount of REX prefixes for code-size reasons.) – Peter Cordes Oct 02 '17 at 15:53
1

`xor r12d, r12d` is better than `xor r12, r12`. Silvermont/KNL doesn't recognize 64-bit xor as a zeroing idiom, only 32-bit. (I have a half-finished edit for [my xor-zeroing answer](https://stackoverflow.com/questions/33666617/what-is-the-best-way-to-set-a-register-to-zero-in-x86-assembly-xor-mov-or-and) which I should really post...) Or better, you could use `ebx` or `ebp`. Or if you're only making one system call at the end, you can use `ecx` because it doesn't have to survive the `syscall`. Also, the OP could have used the stack instead of a static buffer. – Peter Cordes Oct 02 '17 at 16:00

printing numbers in nasm

1 Answers1