0

I am new to assembly i want to write a function to print number in assembly and call it from c++ see code below

cpp:

#include <iostream>
#include <stdint.h>

extern "C" void printunum(uint64_t);

int main()
{
    printunum(12345);
    std::cout<<std::endl;
    return 0;
}

assembly:

   global printunum
section .text
printunum:
   mov rax,rdi
   mov rdi,10
   mov rsi,0

while1:
   cmp  rax,0
   je   endwhile1
   mov  rdx,0
   div  rdi
   inc  rsi
   add  rdx,48
   push rdx
   jmp  while1
endwhile1:

   mov r10,rsi

while2:
   cmp r10,0
   je  endwhile2
   mov rax,1
   mov rdi,1
   mov rsi,rsp  ;to pass memory address 
   add rsp,8    ;it is equal to pop or it is wrong and it will add 8 bytes here?
   mov rdx,1
   syscall
   dec r10
   jmp while2
endwhile2:
   ret

Edited assembly:

      global print_uint64
section .text
    print_uint64:

   ;init
      mov rax,rdi
      mov rdi,10
      mov rsi,rsp
   ;endinit

   while:
      xor  rdx  ,rdx
      div  rdi
      add  rdx  ,48
      dec  rsi
      mov  [rsi],dl
      cmp  rax  ,0
      je   else 
      if: 
      jmp  while
      else:
   endwhile: 

   ;print
      mov rax,1
      mov rdi,1
      mov rdx,rsp
      sub rdx,rsi
      syscall
   ;endprint 

   ;return
      mov rax,rsp
      sub rax,rsi
      ret

i compiled by:

srilakshmikanthanp@HP-245-G5-NOTEBOOK-PC:~/Documents/Learn$ nasm -f elf64 asm.asm
srilakshmikanthanp@HP-245-G5-NOTEBOOK-PC:~/Documents/Learn$ g++ main.cpp asm.o -o main 
srilakshmikanthanp@HP-245-G5-NOTEBOOK-PC:~/Documents/Learn$ ./main
12345
srilakshmikanthanp@HP-245-G5-NOTEBOOK-PC:~/Documents/Learn$

My processer amd64(x86-x64) and i am using kali linux

1)The above code is fine or it is wrong (pop is equal to rsp+8 in x64 bit machine).

2)mul operation puts result into rdx:rax how can i take it into single memory

3)div operation takes dividend from rdx:rax how can i put single value in to rdx:rax. Thanks for your response.

srilakshmikanthanp
  • 2,231
  • 1
  • 8
  • 25
  • This code is very inefficient (storing each character in a separate qword, and making a separate system call per digit) but it's not wrong. Well, other than treating the arg as unsigned 64-bit when it's actually declared in C to only take a 32-bit `int`; usually the caller will write a 32-bit register so it implicitly zero-extends to 64-bit. – Peter Cordes Apr 14 '20 at 05:56
  • No need to load the value into a register, just point RSI at it, so yes, only the rsp+=8 part of pop is needed. It could just `add rsi,8` and restore RSP at the end, outside the loop, instead of using 2 instructions. But obviously this code doesn't care at all about efficiency. It also uses `cmp/je` at the tops of loops instead of the bottom like a normal asm loop. That means if you pass it `0`, it will print no digits instead of a `'0'`. See [How do I print an integer in Assembly Level Programming without printf from the c library?](https://stackoverflow.com/a/46301894) for simpler code – Peter Cordes Apr 14 '20 at 05:58
  • @Peter Cordes after you comments i changed some things in my code i will added in edited assembly can you see it and result of mul stored in rdx:rax how store it into single memory and div operation takes rdx:rax to divide how can i put one value into rdx:rax can you explain it. – srilakshmikanthanp Apr 14 '20 at 06:35
  • Are you multiplying to calculate how to restore RSP? Don't use `mul` for powers of 2, just left shift. Or better, save the original RSP in another register so you can just `mov` instead of calculating anything. Also, you're now printing a buffer with 7 bytes of zeros between every digit. Most programs will choke on that, although it will look right on a terminal. But pipe it into `less` or `hexdump -C` to see what you actually output. Don't use `push` in the first place to store 8 bytes, just use `mov` and decrement a pointer like in my linked answer. – Peter Cordes Apr 14 '20 at 06:54
  • @Peter Cordes as per your suggestion ans answer i changed in the edited assembly now. can you see it – srilakshmikanthanp Apr 14 '20 at 12:57
  • Yup, looks like you fixed all the bugs. :) And yes, subtracting pointers is the right way to handle this. And yes now you're storing contiguous characters so if you `strace ./main` you should see no extra garbage in the write syscall. Your loop condition is still a mess, though: `cmp rax, 0` / `jne while` will keep looping like `do{}while(RAX!=0);` - otherwise it falls through. Don't conditionally jump over a `jmp`, just `jcc` backwards with the opposite condition. – Peter Cordes Apr 14 '20 at 16:33
  • x86-64 System V has a 128-byte "red zone" below RSP that you can use without doing `sub rsp, 16` first; it's guaranteed not to be overwritten by signal handlers or whatever. So it is actually safe to use `rsp` as the top of your buffer. – Peter Cordes Apr 14 '20 at 16:35
  • @PeterCordes I studied somewhere conditional jump statements are used in short distance only. so why I used like this. Does the statement is correct ? Or can I use conditional for long distance – srilakshmikanthanp Apr 14 '20 at 17:04
  • That was true for 16-bit x86 before 386 - `jcc rel16/32` instead of `jcc rel8` was a new encoding in 386 (https://ulukai.org/ecm/insref.htm#i256). But "long distance" means farther than -128 .. +127 bytes, and your loop is *not* that large. In 32 and 64-bit mode `jcc` has the same range as `jmp` because the `jcc rel32` encoding is always available. You never need to think about this (except when jumping farther than +-2GiB, then neither jmp nor jcc can reach. In that case you need the 64-bit absolute address in a register for an indirect jmp, not jcc) https://www.felixcloutier.com/x86/jcc – Peter Cordes Apr 14 '20 at 17:25
  • Your loop is a handful of instructions, probably less than 30 bytes; if you look at the machine code for the `jmp` encoding (or the `jcc` after you simplify), it will be using the 1-byte `rel8` encoding with a pretty small (negative) offset. – Peter Cordes Apr 14 '20 at 17:28

2 Answers2

3

What POP does is:

  • move data from [RSP] to the target register
  • add size of register to RSP

So if you don’t need to actually use the data in the stack you can just add the size of the data to RSP and it has the same effect

Sami Kuhmonen
  • 30,146
  • 9
  • 61
  • 74
  • rsp+8 here 8 is byte right and incrementing pointer or memory will increase in terms of byte ? – srilakshmikanthanp Apr 14 '20 at 05:01
  • can you see the edited assembly and result of mul stored in rdx:rax how store it into single memory and div operation takes rdx:rax to divide how can i put one value into rdx:rax can you explain it. – srilakshmikanthanp Apr 14 '20 at 06:44
  • Why is the 'size of register' added to RSP? – Kaushik Sep 02 '21 at 08:40
  • 1
    @Kaushik Since that much of data is taken from the stack the stack pointer must be adjusted correctly. If you pop a 16bit register you must move it by 2. If 32bit then by 4 bytes etc. – Sami Kuhmonen Sep 02 '21 at 08:57
2

This example has fixes suggested in some to the previous comments and is 24 bytes shorter than your version. Also some of the math is already done as the difference between RBX and RDI is the same as using a counter in R10.

Although building a string by using push is a novel idea, but as pointed out, on some systems 7 NULLs to every character could cause a problem, and not only that it makes for an unnecessarily large buffer.

    global printnum
    section .text

printnum:

; The maximum signed 64 bit value ( 9223372036854775807 ) needs a maximum of 19 digits or
; 20 for unsigned values. Create a buffer on the stack suitably large enough for that.

    push    rbp
    mov     rbp, rsp
    sub     rsp, 32         ; Reserve 32 bytes so stack stays QWORD aligned.

    mov     rax, rdi        ; Move value passed by caller to be converted.
    mov     rdi, rbp        ; Establish pointer to next byte past EOS (End of String).
    mov     ecx, 10         ; Divisor

LtoA:
    xor     edx, edx
    div     rcx
    or       dl, '0'
    dec     rdi             ; This way RDI always points to most recent character written.
    mov     [rdi], dl       ; Essentially DL = RAX % 10.
    test    rax, rax
    jnz     LtoA

showString:
    mov      al, 1          ; Equivalent to SYS_WRITE.
    mov     rsi, rdi        ; RSI points to beginning of string.
    mov     edi, eax        ; Equivalent to STDOUT.
    mov     rdx, rbp
    sub     rdx, rsi        ; Number of byte to be displayed.
    syscall

    leave                
    ret
Shift_Left
  • 1,208
  • 8
  • 17