TL:DR: callers could pass an integer index, and subroutines could compare/branch to figure out which call-site to return to. With the set of call-sites hard-coded into each subroutine (hopefully by automated tools that build these blocks of code.)
With no indirect data addressing, you're going to need self-modifying code to loop over arrays for example, like the Little Man Computer (See Programming arrays in LMC for an example).
Your machine is still Turing-complete (except for having limited RAM) without that; even a machine with just one instruction (dec-and-branch) is Turing complete without self-modifying code, but a program to implement some algorithms might be pretty big.
With no indirect jumps, only with a destination embedded in their machine code, you can't have one instruction that returns directly to different possible callers, without self-modifying or cross-modifying code. So it's not efficiently possible, but you can hack something up with multiple conditional branches to figure out where to jump.
Self-modifying code is the normal way to implement subroutines on a machine like this, if it's a Von Neumann architecture, not Harvard where the program is fixed, not modifiable as data. (On a Harvard design, you normally would provide some indirection since programs can't synthesize it with self-modifying code.)
For example, the callee stores a return address into the operand for a jmp
in the callee, before jumping there. (Perhaps a fixed length before the symbol address, so functions return by jumping to that space before their start address.) That's still self-modifying machine code which you asked to avoid, or at least cross-modifying.
Some historical machines worked essentially that way, with each function having a fixed slot for its caller to store a return address. The
Ferranti Pegasus doesn't have indirect jumps, so self-modifying code was part of the calling convention. It's not re-entrant, so recursion or mutual recursion don't work.
or using an assembler to keep count of which instruction we are at)?
How would that help? If there's no simple way to return to any of multiple call sites in machine code (even knowing all the addresses), an assembler won't have the building blocks to build a ret
or indirect-jump instruction.
An assembler can make it less cumbersome to implement a complex mechanism if we do have one, though.
Strictly avoiding modifying machine code at run-time
Without indirect jumps, the only way to get run-time variable control-flow is conditional branches. Each branch gives you two possible paths of execution (taken or fall-through).
Instead of a return address, a caller can pass an integer telling it which call-site to return to. The subroutine will have to know about all possible call-sites, having code like if (0==B) goto 0x1234
/ if (1==B) goto 0x1246
/ etc., or something equivalent after loading the return index into the A
register, where those addresses follow mov/jump instructions.
The caller can pass a return index along with other args, either in a fixed memory location (per function), or in a register for the callee to store until it's ready to return. (@BipedalJoe had basically the same idea, but this version uses an integer instead of condition-codes which subroutines would have to save / restore if they wanted to do much before returning.)
So you couldn't load new code and call it; you'd have to construct a switch
block of conditional branches for it to use when returning.
Performance would suck as every return goes through a sequence of compare/branch, and it would be a nightmare to maintain by hand as you added new function calls. You'd definitely want an assembler + linker to create these sequences of branches. The number of total call-sites in your program is unbounded, except by program memory size and the size of an integer.
For subroutines with more than a couple callsites, you'd actually use a tree of branches like nested ifs instead of an if / elseif chain. You'd still need a branch or jump for every possible return address, but any single execution would only run about log2(n)
branches before returning, instead of O(n)
, where n
is the number of call-sites. (Like a compiler might do for a big switch
if it didn't or couldn't use a jump table.)
If you aren't doing indirect calls, each call-site can only be returned-to by one subroutine, the one that it jumps to after storing args and return index to memory (or loading into A and B). So there wouldn't be a saving in code-size from having all subroutines share one big switch tree. The same subroutine might make multiple calls to different subroutines, but each call-site within it has its own address.
I think the best strategy to do multiple branches on one value would be to get it into the B
register, then you can li A, 3
and sub A, B
, setting flags and destroying the constants, but leaving B
(return index) unchanged. So if you had 8 possible return addresses, index 0..7, you'd branch on CF from 3-B
to select between the 0..3 range or the 4..7 range. Then within each of those, again, 1-B and 5-B respectively.
subfoo:
... the real work
load A, foo_ret_index # static storage for our return index
mov B, A
li A, 3
sub A, B # assuming sub sets Carry flag on borrow, like x86 (opposite of ARM), so jb (below) is jump if carry-set. Otherwise of course jae is the available jump
jb ret4to7 # jump on unsigned below; if ( 3u < B ) goto
# return index 0..3
li A, 1
sub A, B
jb ret2to3
# return address 0..1, with FLAGS set from 1 - idx
############### these are the actual jumps back to callers ############
je foo_callsite1 # jump if equal, based on the Zero flag
jmp foo_callsite0 # only possibility left is idx==0
ret2to3: # B = 2 (A=-1) or B = 3 (A=-2).
# Not sure if we can avoid an LI using NAND, NOR, ADD, or SUB to set CF or ZF based on those values.
li 2
sub A, B
...
je / jmp
ret4to7:
... similar code here
A
is the only possible destination for load and li
, I'm writing 2-operand syntax just for clarity. Some ISAs with restrictions like that would leave the destination implicit like li 3
.
A chain of branches can be more compact, since we can get B=1
and then repeatedly subtract, checking ZF. Like if (--A == 0) goto dst
.
li A, 1
mov B, A
load A, foo_ret_index # static storage for our return index
sub A, B
jb foo_callsite0 # 0u < 1
je foo_callsite1 # 1 == 1
sub A, B
je foo_callsite2
sub A, B
je foo_callsite3
...
After ruling out zero, we could subtract by 2 with a jb
and je
for every sub
. But that would take more overhead at the start; it takes 3 instructions to get a new constant into B and reload A, so it's only worthwhile for a long chain, where you might want a tree anyway unless optimizing purely for code size.
An li B, imm
instruction would help a lot. Or if load / mov set FLAGS, then we can check for 0
without a sub first. I was assuming they wouldn't, since most ISAs don't do that. Most ISAs have a way to test A,A
or something to set FLAGS according to an AND, but you only have NAND/NOR which would invert the register. I guess we could actually work with that, using add
and counting up toward 0
after doing idx = ~idx = 0u - idx - 1
The same mechanism could be used to implement function pointers for indirect calls. And you might share the same return branch tree between subroutines that are all possible callees of some indirect-call sites, if you aren't using one global tree.