Is it possible to implement subroutine call without a stack nor indirect addressing?

Question

is it possible to use subroutine with a cpu that doesn't feature indirect addressing nor a way to store the program counter, it would only feature :

2 register A and B ( and a status register for carry and zero flag)
And those instructions:
- load the A register with an address as operand
- store the A register with an address as operand
- load an immediate value into the A register
- move the A register into the B register
- move the B register into the A register
- add A to B and store the result in A
- sub B from A and stoee the result in A
- nand A with B
- nor a with B
- branch if carry with a target address as operand
- branch if zero with a target address as operand
- jump with a target address as operand
- halt the cpu
- nop (do nothing for a clock cycle)

The problem is, to return from a subroutine you have to store the program counter somewhere first when you call it and the instruction set doesn't allow me to do this also with indirect addressing I can't return to a variable address.

Is it even possible to implement some sort of subroutine without self modifying code (or using an assembler to keep count of which instruction we are at)?

If you're not using assembly, what language is this actually referring to? — MatBailie, Dec 10 '22 at 14:32
without and assembler, basically either assembling the program by hand or directly using machine code — stuart19, Dec 10 '22 at 14:36
instruction address should always be available as a constant. Assembly languages give you access to that address via instruction labels. Load the constant return address in a register prior to jumping into a subroutine, and use it to jump back when you are done. — Sergey Kalinichenko, Dec 10 '22 at 14:39
What if you can't load the address into a register or into ram, if the cpu only support direct addressing — stuart19, Dec 10 '22 at 14:40
If jump cannot do indirect addressing, it seems pretty hard? — BipedalJoe, Dec 10 '22 at 14:40
I'm too much of a beginner to speak in absolutes! Hehe. But normal subroutines, the issue is the different return addresses depending on where it is called from right? — BipedalJoe, Dec 10 '22 at 14:44
Maybe with the jump with carry stuff... and make sure you have a carry or not depending on where it was called from — BipedalJoe, Dec 10 '22 at 14:45
ADD to get carry. Jump to subroutine. Jump if carry to return address. Then from other place in code, ADD to not get carry. Jump to subroutine. Then, it skips jump if carry, continues to a jump instruction to return address you then needed. — BipedalJoe, Dec 10 '22 at 14:46
How would an assembler help? By inventing an instruction that stores the return address into the operand for a `jmp` in the callee, before jumping there? (Perhaps in the instruction a fixed length *before* the symbol address.) That's still self-modifying machine code which you asked to avoid, or at least cross-modifying. Some historical machines worked almost of that way, with each function having a fixed slot for its caller to store a return address. It's not re-entrant, so recursion or mutual recursion don't work. — Peter Cordes, Dec 10 '22 at 14:48
You are right thats still self modifying code thats why i wanted to know if it was possible without self modifying code — stuart19, Dec 10 '22 at 14:49
With the carry and zero flag, and jump instruction, you can arrange three return addresses. JC, operand. JZ, operand. JMP, operand. Seems like it would still be subroutines. Just would be restricted in using anything that sets carry and zero flag in the subroutines. — BipedalJoe, Dec 10 '22 at 14:52
Could you develop a bit more i'm not sure to understand ? Wouldn't it be the same to just jump to the subroutine then jump back to where we jumped (basically returning) but it would still make the return address fixed? — stuart19, Dec 10 '22 at 14:54
I think the thing with subroutines is you need to be able to return to different addresses. That the "subroutine" instruction sequence is reusable (if subroutine is not reused it is just a segment of the normal instruction sequence. ) And that you could hack that with the JC, JZ and JMP instructions to reuse a subroutine three times. — BipedalJoe, Dec 10 '22 at 14:55
Uhm, I am a beginner so I avoid absolutes as much as possible. If you evaluate JC, and it is carry, it has to return. If it is not, you get a chance to evaluate JZ. If it is true, you have to return. If it is not, you get a chance to jump. Three times. Right? — BipedalJoe, Dec 10 '22 at 14:59
Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/250311/discussion-between-stuart19-and-bipedaljoe). — stuart19, Dec 10 '22 at 15:00
[Indirection on a DEC PDP 8](https://stackoverflow.com/q/30284936) explains how PDP-8 does it, with no stack but with indirect jumps. — Peter Cordes, Dec 10 '22 at 15:14
If you don't have any indirect jumps, it's obviously not *efficiently* possible for one subroutine to return to two different callers without modifying machine code. You could imagine a caller passing an integer, and having a global "switch" tree of compare/branch that would dispatch execution back to the right call-site; functions could return by doing that (@BipedalJoe: that's basically your idea, except using an integer, and not trying to stop the callee from saving and restoring the return-index which would severely cramp its style to not be able to use condition codes.) — Peter Cordes, Dec 10 '22 at 15:17
Would you be able to call the subroutines as much as needed ? Or would there be limitation (disregarding memory limitation) — stuart19, Dec 10 '22 at 15:21
Don't forget to @user notify people when you reply. The number of total call-sites in your program is unbounded, except by memory and the size of an integer. Each subroutine could have its own more-compact switch tree instead of one global one (hopefully built by an assembler/linker, would be a nightmare to maintain by hand). This is of course very inefficient, with every return going through a sequence of compare/branch. BTW, [How does "self-modified link" work in Pegasus programming?](https://stackoverflow.com/q/60207110) describes in more detail how some ancient machines did it. — Peter Cordes, Dec 10 '22 at 15:28
@PeterCordes Thank you for your help i'll have a look at this link. — stuart19, Dec 10 '22 at 15:29
That link is still self-modifying code and/or indirection, which you're trying to avoid. But yeah, worth understanding what you're avoiding. — Peter Cordes, Dec 10 '22 at 15:30
Just implement your subroutines as many times as there are call sites. (Could lead to infinitely large code for unbounded recursive calls, even if your recursive function initially only has two call sites.) — Sebastian, Dec 10 '22 at 20:04
If you do not keep count at which instruction you are, even simple calls can be difficult to implement (as you do not know, at which address a function starts, so how to implement a jmp?). — Sebastian, Dec 10 '22 at 20:06

score 1 · Answer 1 · answered Dec 10 '22 at 15:22

Evaluation of JC and JZ conditional jumps allow a form of "hack" where subroutines can hardcode four return addresses. The two flags, one bit each, are two bit. Two bits are four possible combinations: 00, 01, 10, 11. These are for the flags C:0 and Z:0, C:0 and Z:1, C:1 and Z:0, C:1 and Z:1.

These sequences at the end of the subroutine evaluate those four conditions, and branch to four different return addresses ("RTN" means the hardcoded return address specified in operand. )

1: JC,5
2: JZ,4
3: JPM,RTN
4: JPM,RTN
5: JZ,7
6: JPM,RTN
7: JMP,RTN

If JC,5 is 0, it continues to JZ,4. If JZ,4 is 0, it picks the first return address. If JC,5 is 0 but JZ,4 is 1, it picks the second return address. If JC,5 is 1 and JZ,7 is 0, it picks the third return address. If JC,5 is 1, and JZ,7 is 1, it picks the fourth address.

A "subroutine" is just when a sequence of instructions that is part of a normal program is able to be reused, thanks to that different return addresses can be specified. So, if JC, JZ and JMP allows four different return addresses, it seems like that would implement a very limited form of subroutines, using the constraints specified in the question.

I am a beginner, and I thought this was interesting to think about, but this "hack" was my initial thought from the question.

This is same essential idea Erik and I had, of conditional branches to select a return address. But passing a return-index in condition codes means your function can't do any math itself! Unless it saves the condition codes somewhere, e.g. by doing conditional branches to select one of four `load-immediate` instructions at the start of the function, and then later branching on that integer index when it wants to return. Passing an integer in the first place is also easier for the caller than generating these FLAGS, and easier for the callee if it wants to do any add/sub/nand/nor. — Peter Cordes, Dec 10 '22 at 19:26
yours took it to the next level. i thought of it too but settled for the easier one because it was easier to conceptualize for myself in full, but yeah passing an argument (writing it to beforehand known address in heap), then doing jump if zero after subtraction (compare if equal) lets someone use practically infinite number of subroutines, up to integer size. and modifying the program code and inserting new return address before each subroutine call was a good idea too also works. — BipedalJoe, Dec 11 '22 at 05:03

score 1 · Answer 2 · answered Dec 10 '22 at 17:37

Is it possible to implement subroutine call without a stack

Yes. The PDP-8 did not employ a stack — that is to say it had no direct hardware support for stack, such as a stack pointer register. However, it did allow for function calling, just not recursion.

nor indirect addressing?

That's harder.

In calling a function, the PDP-8 would store the return address (linkage) into the first word of the function (so that first word is data, and the instructions for the function started at that address + 1). Returning though, required indirection, so as to return back to the proper caller.

Indirection can sometimes be substituted with self-modifying code, but here that would be complicated unless limited to a small range of memory that the indirection needed to accomplish.

is it possible to use subroutine with a cpu that doesn't feature indirect addressing nor a way to store the program counter, it would only feature :

I suppose that a (potentially large) if-then structure could be generated by hand, or by compiler, to decide where to return, given a call site's value as follows:

At each call site to function X, pass a unique integer as parameter, uip. The called function, X, then, at the end, does:

if (uip == 1) goto resume_call_site_1;
if (uip == 2) goto resume_call_site_2;

etc.., using only (direct) goto's (no indirection). To do this by hand would work for small programs, but otherwise be an ongoing maintenance issue.

In the end I suppose, we see that even without hardware support, there are ways to simulate things to accomplish our intent. There has always been an alternation of advancing software requirements and hardware-provided features.

score 1 · Answer 3 · answered Dec 10 '22 at 17:51

TL:DR: callers could pass an integer index, and subroutines could compare/branch to figure out which call-site to return to. With the set of call-sites hard-coded into each subroutine (hopefully by automated tools that build these blocks of code.)

With no indirect data addressing, you're going to need self-modifying code to loop over arrays for example, like the Little Man Computer (See Programming arrays in LMC for an example).

Your machine is still Turing-complete (except for having limited RAM) without that; even a machine with just one instruction (dec-and-branch) is Turing complete without self-modifying code, but a program to implement some algorithms might be pretty big.

With no indirect jumps, only with a destination embedded in their machine code, you can't have one instruction that returns directly to different possible callers, without self-modifying or cross-modifying code. So it's not efficiently possible, but you can hack something up with multiple conditional branches to figure out where to jump.

Self-modifying code is the normal way to implement subroutines on a machine like this, if it's a Von Neumann architecture, not Harvard where the program is fixed, not modifiable as data. (On a Harvard design, you normally would provide some indirection since programs can't synthesize it with self-modifying code.)

For example, the callee stores a return address into the operand for a jmp in the callee, before jumping there. (Perhaps a fixed length before the symbol address, so functions return by jumping to that space before their start address.) That's still self-modifying machine code which you asked to avoid, or at least cross-modifying.

Some historical machines worked essentially that way, with each function having a fixed slot for its caller to store a return address. The Ferranti Pegasus doesn't have indirect jumps, so self-modifying code was part of the calling convention. It's not re-entrant, so recursion or mutual recursion don't work.

or using an assembler to keep count of which instruction we are at)?

How would that help? If there's no simple way to return to any of multiple call sites in machine code (even knowing all the addresses), an assembler won't have the building blocks to build a ret or indirect-jump instruction.

An assembler can make it less cumbersome to implement a complex mechanism if we do have one, though.

Strictly avoiding modifying machine code at run-time

Without indirect jumps, the only way to get run-time variable control-flow is conditional branches. Each branch gives you two possible paths of execution (taken or fall-through).

Instead of a return address, a caller can pass an integer telling it which call-site to return to. The subroutine will have to know about all possible call-sites, having code like if (0==B) goto 0x1234 / if (1==B) goto 0x1246 / etc., or something equivalent after loading the return index into the A register, where those addresses follow mov/jump instructions.

The caller can pass a return index along with other args, either in a fixed memory location (per function), or in a register for the callee to store until it's ready to return. (@BipedalJoe had basically the same idea, but this version uses an integer instead of condition-codes which subroutines would have to save / restore if they wanted to do much before returning.)

So you couldn't load new code and call it; you'd have to construct a switch block of conditional branches for it to use when returning.

Performance would suck as every return goes through a sequence of compare/branch, and it would be a nightmare to maintain by hand as you added new function calls. You'd definitely want an assembler + linker to create these sequences of branches. The number of total call-sites in your program is unbounded, except by program memory size and the size of an integer.

For subroutines with more than a couple callsites, you'd actually use a tree of branches like nested ifs instead of an if / elseif chain. You'd still need a branch or jump for every possible return address, but any single execution would only run about log2(n) branches before returning, instead of O(n), where n is the number of call-sites. (Like a compiler might do for a big switch if it didn't or couldn't use a jump table.)

If you aren't doing indirect calls, each call-site can only be returned-to by one subroutine, the one that it jumps to after storing args and return index to memory (or loading into A and B). So there wouldn't be a saving in code-size from having all subroutines share one big switch tree. The same subroutine might make multiple calls to different subroutines, but each call-site within it has its own address.

I think the best strategy to do multiple branches on one value would be to get it into the B register, then you can li A, 3 and sub A, B, setting flags and destroying the constants, but leaving B (return index) unchanged. So if you had 8 possible return addresses, index 0..7, you'd branch on CF from 3-B to select between the 0..3 range or the 4..7 range. Then within each of those, again, 1-B and 5-B respectively.

subfoo:
  ... the real work

  load  A, foo_ret_index      # static storage for our return index
  mov   B, A
  li    A, 3
  sub   A, B          # assuming sub sets Carry flag on borrow, like x86 (opposite of ARM), so jb (below) is jump if carry-set.  Otherwise of course jae is the available jump
  jb    ret4to7       # jump on unsigned below; if ( 3u < B ) goto

# return index 0..3
  li    A, 1
  sub   A, B
  jb    ret2to3
# return address 0..1, with FLAGS set from 1 - idx
############### these are the actual jumps back to callers ############
  je    foo_callsite1  # jump if equal, based on the Zero flag
  jmp   foo_callsite0  # only possibility left is idx==0

 ret2to3:  # B = 2 (A=-1) or B = 3 (A=-2).  
       # Not sure if we can avoid an LI using NAND, NOR, ADD, or SUB to set CF or ZF based on those values.
   li  2
   sub A, B
   ...
   je / jmp
  

ret4to7:
   ... similar code here

A is the only possible destination for load and li, I'm writing 2-operand syntax just for clarity. Some ISAs with restrictions like that would leave the destination implicit like li 3.

A chain of branches can be more compact, since we can get B=1 and then repeatedly subtract, checking ZF. Like if (--A == 0) goto dst.

  li    A, 1
  mov   B, A
  load  A, foo_ret_index      # static storage for our return index
  sub   A, B
  jb    foo_callsite0         # 0u < 1
  je    foo_callsite1         # 1 == 1
  sub   A, B
  je    foo_callsite2
  sub   A, B
  je    foo_callsite3
  ...

After ruling out zero, we could subtract by 2 with a jb and je for every sub. But that would take more overhead at the start; it takes 3 instructions to get a new constant into B and reload A, so it's only worthwhile for a long chain, where you might want a tree anyway unless optimizing purely for code size.

An li B, imm instruction would help a lot. Or if load / mov set FLAGS, then we can check for 0 without a sub first. I was assuming they wouldn't, since most ISAs don't do that. Most ISAs have a way to test A,A or something to set FLAGS according to an AND, but you only have NAND/NOR which would invert the register. I guess we could actually work with that, using add and counting up toward 0 after doing idx = ~idx = 0u - idx - 1

The same mechanism could be used to implement function pointers for indirect calls. And you might share the same return branch tree between subroutines that are all possible callees of some indirect-call sites, if you aren't using one global tree.

Is it possible to implement subroutine call without a stack nor indirect addressing?

3 Answers3

Strictly avoiding modifying machine code at run-time