How to implement CALL & RET to an assembler?

Question

According to this made-up microprocessor architecture's instruction set: https://github.com/mertyildiran/DASM

In our toy assembler written in C, we have implemented PUSH & POP instructions as shown below:

PUSH:

Which is basically the combination of 4 DEC and 1 ST instruction.

        else if (strcmp(token,"push")==0) // PUSH instruction: combination of 4 DEC and 1 ST instruction on Stack Pointer (SP)
        {
            op1 = strtok(NULL,"\n\t\r ");
            op2[0] = sp; // Let's say address of SP is 9
            printf("\n\t%s\t%s\n",strupr(token),op1);
            ch = (op2[0]-48) | ((op2[0]-48)<<3); // Prepare bitwise instruction format for DEC instructions
            program[counter]=0x7800+((ch)&0x00ff); // Decrease Stack Pointer 4 times
            printf("> %d\t%04x\n",counter,program[counter]);
            counter++;
            program[counter]=0x7800+((ch)&0x00ff); // Decrease Stack Pointer 4 times
            printf("> %d\t%04x\n",counter,program[counter]);
            counter++;
            program[counter]=0x7800+((ch)&0x00ff); // Decrease Stack Pointer 4 times
            printf("> %d\t%04x\n",counter,program[counter]);
            counter++;
            program[counter]=0x7800+((ch)&0x00ff); // Decrease Stack Pointer 4 times
            printf("> %d\t%04x\n",counter,program[counter]);
            counter++;

            ch = ((op1[0]-48) << 2) | ((op2[0]-48) << 6); // Prepare bitwise instruction format for ST instruction
            program[counter]=0x3000+((ch)&0x00ff); // Store the value in Stack
            printf("> %d\t%04x\n",counter,program[counter]);
            counter++;
        }

POP:

Which is basically the combination of 1 LD and 4 INC instructions.

        else if (strcmp(token,"pop")==0) // POP instruction: combination of 1 LD and 4 INC instructions on Stack Pointer (SP)
        {
            op1 = strtok(NULL,"\n\t\r ");
            op2[0] = sp; // Let's say address of SP is 9
            printf("\n\t%s\t%s\n",strupr(token),op1);

            ch = (op1[0]-48) | ((op2[0]-48) << 3); // Prepare bitwise instruction format for LD instruction
            program[counter]=0x2000+((ch)&0x00ff); // Store the value in Stack
            printf("> %d\t%04x\n",counter,program[counter]);
            counter++;

            ch = (op2[0]-48) | ((op2[0]-48)<<3); // Prepare bitwise instruction format for INC instructions
            program[counter]=0x7700+((ch)&0x00ff); // Decrease Stack Pointer 4 times
            printf("> %d\t%04x\n",counter,program[counter]);
            counter++;
            program[counter]=0x7700+((ch)&0x00ff); // Decrease Stack Pointer 4 times
            printf("> %d\t%04x\n",counter,program[counter]);
            counter++;
            program[counter]=0x7700+((ch)&0x00ff); // Decrease Stack Pointer 4 times
            printf("> %d\t%04x\n",counter,program[counter]);
            counter++;
            program[counter]=0x7700+((ch)&0x00ff); // Decrease Stack Pointer 4 times
            printf("> %d\t%04x\n",counter,program[counter]);
            counter++;
        }

So my question is how can we implement CALL & RET instructions with using a stack?

I'm aware that CALL instruction will store current state of PC in the stack so the program will be able to return where it left with RET instruction. But it's leading me to two subquestions:

After CALL executed, how the program get address previously stored in the stack if in the subprocedure, an instruction PUSHed something to the stack or overwrite the return address of CALL.
How can we pass the related label's address to CALL in machine code level? In our assembler JMP & jZ instructions are also not completed, because of the same reason.

If you want to look at the whole picture: https://github.com/mertyildiran/DASM/blob/master/assembler.c

For 2. subquestion(JMP/CALL) you can explain it over this example with maybe jmp lpp line:

.data
     count: 60
     array: .space 10
     char: 0xfe
.code
        ldi 0 count
        ld  0 0
        ldi 1 array
        ldi 2 char
        ld  2 2
lpp     st 1 2
        inc 1
        dec 0
        jz loop
        jmp lpp
loop    sub 1 2 3
lp1     jmp lp1

"how the program get address previously stored in the stack if in the subprocedure, an instruction PUSHed something to the stack or overwrite the return address of CALL." ... CPU doesn't care, the next `RET` will return to the wrong address (the new/modified value in stack front, when ret is called). — Ped7g, Jul 20 '16 at 15:51
@Ped7g Then it will be the assembly program's fault not the assembler's, right? — mertyildiran, Jul 20 '16 at 15:54
`call label` can be break down for example as `push next_instruction` `jmp label` `next_instruction: ...` (`next_instruction:` being fictional implicit label used internally by assembler, actually you can do just `$+size_of_call` without defining the symbol) ... Yes, it's program responsibility to keep stack values meaningful, just as you have to keep meaningful value in registers and executed instructions. The CPU just does, what it is told to. So in sub-routine, if you `push` three words on stack, you should either `pop` size of three words, or `add sp,3*word`, before calling `ret`. — Ped7g, Jul 20 '16 at 15:56
and `ret` can be imagined as `pop ip` (where `ip` is instruction pointer register) .. so the next instruction will be fetched from the address, which was on "top" of stack. — Ped7g, Jul 20 '16 at 15:58
@Ped7g One more thing, could you explain how an assembler pass label address to JMP instruction? — mertyildiran, Jul 20 '16 at 16:01
Added some more comments about source in answer.. But now I finally figured out, what I wanted to say about "breaking down" instructions into the basic ones. That should be not responsibility of assembler. Assembler should produce only direct 1:1 mapping from written instruction in text encoding into machine code binary bytes. What you are doing in your `push` handling can be viewed either as macro compiler, or higher level language compiler. In that case you may do two compilers, one translating extended asm into basic asm (push -> store+4xdec), other to translate store/dec into binary bytes. — Ped7g, Jul 20 '16 at 18:38
For a detailed review, how relative jumps work in x86, you may want to check this neat answer on different topic. (I understand it's not directly relevant to your platform, but it may help to better understand what assemblers do): http://stackoverflow.com/a/38487923/4271923 — Ped7g, Jul 20 '16 at 22:14

Ped7g · Accepted Answer · 2016-07-20T18:31:56.287

"how an assembler pass label address to JMP instruction?"

That should be obvious from the machine/CPU architecture, how it does decode the instructions. I'm a bit confused from your question, whether the made-up CPU is already final (then it should describe, how you can load pc register = that's basically what jump does, plus it may be conditional), or you are doing two projects in one, creating both the HW machine specs, and an assembler for it. And I'm too lazy to read it again, so I will just show you common ways from real world.

ARM - RISC-like instruction set, fixed opcode size:

Both in 16b (thumb) and 32b mode the first few bits of opcode specify the instruction (B, BL, BLX, B is pure jump, BL is similar to CALL, but not using stack, a "link" register is used instead to store return address), the rest of bits specify either register holding the target address (so to call function foo you can do load r0,foo bl r0), or immediate value which is relative to current PC.

Which means, that in 16b mode you can jump +-2kiB unconditionally, or -252 to +258 conditionally (the condition variant is encoded in further bits, taking away some from the immediate).

This leads sometimes to situation, where high level compilers either use the register variant, or jump near enough to another jump instruction which hops even further.

In 32b mode the remaining bits reserved for immediate give you much better range, about +-32MiB for all variants. (there's one more mode, 32b "thumb" which has one more different encoding, but that's irrelevant in this example).

Fun fact, the bit-0 in address does specify whether the instruction there is in thumb mode, or full 32b mode, as all addresses must be aligned on ARM, so jump to address 0x00000001 is jump to 0x00000000 with switch of CPU to thumb mode.

x86 - CISC-like historically extended beyond good taste

this one has almost all possible variants (except the one you really need in that long .asm you have been writing last 3 months). Instructions encoding has variable length, so the instruction does use as many bytes, as decided by Intel to be needed.

relative jumps (short with single byte -128 ... +127) - check
absolute jumps to address (encoded as jmp opcode byte, then as many bytes as needed to encode the address)
register jumps (to the address stored in register)
conditional relative jumps (not just "extended" jmp encoding as in ARM case, but specific opcodes)
extras like loop, jcxz/jecxz and maybe some more I even forgot.

So it boils down either to load the value into some register, or encode the immediate into the instruction opcode, then either treat it as an absolute address (which for example on 32b platform, where you manage to encode 25 bits for address, would allow you to address 32MiB of RAM; instead of 4GiB), or as relative address (25bits => pc+-16Mi).

Addendum, which is maybe part of your problem? Like how to "know" what address will be at some future label?

Most of the assemblers are two-pass, so they first generate the instruction opcodes (and can calculate the length of each part of source), and collect all symbols in symbol table, then all address symbols are defined according to the opcodes lengths and org directives. Then in second pass you fill up all immediate values in opcodes by the particular symbol value.

This also shows, why two-pass assemblers can't handle jmp rel8/rel16 (relative jump defined either by 8b or 16b immediate) automatically, but programmer has to specify which one he wants to use. (or use multi-pass assembler, which will try rel8 first, when failed, it will recode the jmp with rel16 and move+recompile everything beyond that point).

Looking at the example source of your push encoding I feel like there's some work for you to change the way your assembler works... (it's ugly anyway, like both creating opcodes and also printing output on screen - don't hesitate to write it one more time, I bet it will turn out more clean and simpler).

edit: finally, I probably understand that piece of assembler better.

So program is uint16_t[], right? (should have been part of question).

And the instruction machine codes are fixed size, 16b too.

But then your counter counts in 16b words... so is that the addressing mode of the target architecture?

That at address 0 is stored 16b word, and on address 1 is stored the next 16b word? (and it's not overlapping, as at x86, where addresses are counted in 8b bytes => in such case first instruction word would be at addresses 0 and 1 and second word would be at addresses 2 and 3. Make sure your counter follows correct addressing scheme (or you will have to convert it when defining value of label symbols).

And your asm is producing output like 0x0000 0x7802: address 0, word 7802, doing some instruction 78 with params 2 (unclear endianness! Either you will be lucky to have the same one on compiling host, or you will end with wrong machine code with swapped bytes in each word), next opcode 0x0001 0x7803, ... etc...

So it looks like you have fixed 16b size for instruction encoding, not sure if jmp will eat whole 8b like store/load instructions, or that one is special, keeping more bits for immediate encoding. If only other byte is available, only meaningful way to use it is as signed 8b -128..+127 relative jump.

If addressing works with 16b words, then you can effectively jump -128..+127 instructions back/forward. If addressing works with 8b bytes, your range is down to -64..+63 instructions only. Hard to tell, as I didn't figure out any details about target platform (you should have added some link or something, at least example how jmp is encoded and how memory is mapped).

Could you explain the example in my question in machine(hexadecimal) from `lpp st 1 2` to `jmp lpp`? Because I want to understand how **jmp** and **loop** works actually in machine code. Assembler is basically generating this file line by line machine code equivalent of assembly, so you can explain in this format: https://github.com/mertyildiran/DASM/blob/master/RAM — mertyildiran, Jul 20 '16 at 16:54
I'm sorry if I'm asking too much but it's hard topic to understand for me. Please explain it like explaining to a dummy. — mertyildiran, Jul 20 '16 at 16:55
@mertyildiran I added one more paragraph, which sort of answers your "line by line". That's not possible (only for jumps going back, to symbol which you already compiled and could have stored in symbol table). I have trouble to understand what you mean by `lpp st 1 2 jmp lpp` way... But the machine code should be like ` <1> <2> <-5>` where ` <-2>` is jumping to itself in infinite loop. I think you don't use three bytes for `st` and `1 2` is somehow encoded together with ``? So adjust it, but I hope the principle will be visible. — Ped7g, Jul 20 '16 at 17:01
@mertyildiran so basically you should create a symbol table, where when label `lpp` is reached, the `counter` value is stored in the symbol table, for example `lpp = 54`. Then in second pass when `jmp lpp` at counter 33 is stored as two byte opcode (rel8), and the `pc` will point already behind `jmp` (architecture definition), you know you have to add 54-35 (=+19) to land at 54 `pc`. So you encode that `jmp lpp` in second pass as ` <+19>`. (encoded as ` ? or symbol index>` during first pass) — Ped7g, Jul 20 '16 at 17:06
@mertyildiran But maybe I seriously misunderstood what your assembler is doing. Usually assembler is producing machine code bytes from assembly language. So I'm confused why your `push` assembly creates `store + 4x dec` instructions, the target CPU does not have native `push`? Or `sub/add` to do `sub sp,4/add sp,-4`? I hope you will somehow manage to put it together, sorry if I confused you by something irrelevant to your project. — Ped7g, Jul 20 '16 at 17:12
It has also native push OK I understood now. Thank you so much. — mertyildiran, Jul 20 '16 at 17:15
x86: `absolute jumps to address (encoded as jmp opcode byte, then as many bytes as needed to encode the address)`. Only "far" jmp has an immediate absolute encoding. "near" jumps are either short (rel8) or regular (rel32). The opcode determines the length of the rest of the instruction, either rel8 or rel32, not like a null-terminated string or something! http://www.felixcloutier.com/x86/JMP.html and http://www.felixcloutier.com/x86/Jcc.html. See also [How do I know whether MASM encodes my JMP instruction using a relative or absolute offset?](http://stackoverflow.com/q/38434627/224132.) — Peter Cordes, Jul 21 '16 at 02:51
@PeterCordes thanks you for correction. BTW, _the opcode on x86 defines length of instruction_ is OK for `jmp`, but generally would need explanation of prefixes, as `mov ax,#n` is the same opcode as `mov eax,#n` (if you don't count prefix as part of opcode). Hm, maybe I shouldn't have use the x86 at all, it's quite complicated after all those generations of backward compatibilities and extensions. While the OP is trying to work with fixed length made-up platform with much simpler architecture. Then again, if he can understand all this x86... his own platform should look easy after that. :D — Ped7g, Jul 21 '16 at 09:49
My point was that after looking at the prefixes + opcode and the current mode, the length of the rest of the instruction is always known without parsing the immediate. rel8 and re16/32 use different opcodes. If you want to talk about 16-bit mode vs. 32/64-bit modes (which I didn't :P), then in 16-bit mode it's `jmp rel16`. In 32/64-bit modes, it's `jmp rel32`. (So you potentially have to use `mov reg, imm64` / `jmp reg` to do an absolute indirect jump to reach code that's farther away than a 32bit relative displacement can reach). — Peter Cordes, Jul 21 '16 at 09:55
BTW, I also was totally confused by this question, and had serious doubts that the OP understood that each asm instruction normally translates to exactly one machine instruction. (Although some RISC assembly languages have pseudo-instructions for things like loading constants that don't fit in an immediate, and even [`addiu` with a constant that doesn't fit in a sign-extended imm16 assembling to multiple MIPS instructions](http://stackoverflow.com/a/38451961/224132)). — Peter Cordes, Jul 21 '16 at 09:59

How to implement CALL & RET to an assembler?

1 Answers1

ARM - RISC-like instruction set, fixed opcode size:

x86 - CISC-like historically extended beyond good taste