See @Marco van de Voort's answer for a list of simple "normal" ways to write branchless code (cmov/other predicated execution, cmp/setcc, copy flags to a GP register, etc). That forms a building block for making branchless versions of things which would be much better done with branches.
When you say "make decisions", it sounds like you're really asking if such a machine would be fully programmable.
Part of the Theory of Computation in computer science is figuring out how powerful different models of computation are. This isn't about how efficient a model is, it's about what can be computed with it at all. I suspect that a register machine without conditional branch instructions would not be equivalent in power to a Turing machine, aka Universal Computer. (Even if you allowed non-indirect unconditional branches).
Update: x86's mov
instruction is Turing-complete even without self-modifying code, just using normal indexed addressing modes. You also need a loop around the sequence of loads and stores. TODO: rewrite the parts of this answer invalidated by that. This increases the value of unconditional branches without self-modifying-code (e.g. letting IP
wrap around in real mode, or other weird tricks for doing this without a jmp
, see below).
I think the mov
-only Turing Machine is sort of simulating another machine in the registers and memory of x86, so instead of the x86 [R/E]IP branching, a value in a general-purpose register behaves like a program counter that's branching. (Or a Turing machine that's traversing a tape)
You can make very stripped down architectures which are still Univeral Computers, but what I gather from what I've seen so far is that any model of computation similar to a register machine (like x86, ARM, etc.) needs some version of a conditional branch to be a Universal Computer.
See also this SO question: Minimal instruction set to solve any problem with a computer program, especially the Bit-manipulating machine part of Wikipedia's One Instruction Set Computer article; it's least like a conditional branch, but still does the same job.
I don't know enough theory of computation to put accurate bounds or a formal description on what you can compute without conditional branches (on a register machine like x86 or ARM), but I can say something:
I can say that given enough space for instructions, you could sort an array, but maybe only if you generated the right sequence of instructions for that array length. Using branchless techniques like cmov, you can conditionally swap two memory locations based on a comparison. A good choice of algorithm would be a sorting network.
You can fully unroll any loop of known length, and maybe handle variable-length loops by unrolling the maximum number of iterations you will ever need. You can use branchless techniques to load/store from dummy addresses when your loop counter is past the end of the array. (e.g. use cmov to set your pointer to point to a small static array that only exists for you to store garbage into without affecting anything else. Your code will always do the same fixed number of stores, but with different inputs different ones of those stores can go into the dumping ground.) ARM predicated execution of any instruction means you can do conditional stores without needing a static dumping ground.
Fully unrolling Merge Sort should work well, because it always does the same amount of work for a given array size. Quick Sort, on the other hand, would be very inconvenient because the amount of work varies with the data and choice of partition. Moreover, it has an O(n^2) worst case, so you'd need at least that much code.
Since you have to fully unroll anything that you want to run, it means you need to know how long it will run. This means you can't implement a full Turing Machine, because not all Turing Machines do halt. (Not to mention that it's impossible for an algorithm to exist that decides whether any given Turing Machine halts).
Self-modifying code might increase the efficiency of this, but I don't think it could increase the computational power if you're still not allowed to rewrite your own machine code to include branch instructions. (We have to exclude SMC from generating any kinds of branches, not just conditional-branch instructions, because we could conditionally produce an unconditional-branch instruction.)
I think SMC in a fixed loop is probably more powerful than non-self-modifying code in a loop, even if the SMC isn't allowed to generate jump instructions. See below for some crazy ideas for looping without branch instructions (e.g. timer interrupt and IP
wraparound).
With unconditional jumps only (no conditional jumps), you can make loops to handle arbitrary-size versions of the same problems as without jumps. But then your program can't terminate. If there was some kind of "terminate program" instruction, putting it inside the loop would make it a non-loop.
So what you could end up with is a program that eventually has an array sorted (or whatever its job was), and maybe it signals this fact by storing a "true" somewhere, and keeps looping without affecting the data (using branchless conditional code). You can sort of think of this as having a "transport-triggered" exit function, so it can be used conditionally.
More ways to emulate an unconditional jump so you can create a loop, I think increasing the power of self-modifying code. (Even if you refrain from using that power to conditionally branch).
function-call instructions can easily be abused them to implement plain jumps, so we'd obviously want to exclude them.
On x86, you could use call
to jump to code that just pops the return address of the stack instead of ever returning. (So you could implement a loop). To build a conditional branch out of x86's indirect call
instruction, you'd use branchless techniques (like cmov) to get either the instruction after call or a different destination into a register, and run call rax
. Or more simply, with eax set to 0 or 1 (e.g. from setcc), you could run call [target_table + rax*8]
to select between two destinations for the branch.
Similarly, ret
is easy to abuse as an indirect jump: push
an address and run ret
. (Don't do this in real life: CPUs are optimized for correct call/ret nesting, and mismatching them causes the return-address predictor stack to mis-predict on future RETs.)
Interrupts cause the CPU to jump to the interrupt handler. Software interrupts (like x86's int 0x80
instruction, or ARM svc
/ swi
) synchronously jump to the interrupt handler, so they're effectively just a function all (and a mode switch into kernel mode).
So that means int
is another jump instruction we need to exclude. But we can also use the timer interrupt to loop. The interrupt handler is the top of the loop. Do whatever amount of work you want and then re-enable interrupts. Execute NOPs after that; the next timer interrupt will eventually fire. Or on a machine that leaves interrupts enabled while servicing interrupts: limit your loop body to an amount of work that can always fit between timer interrupts, even with worst-case cache behaviour.
Another way for software to generate an interrupt is by running an illegal instruction. Self-modifying code can conditionally branch to the illegal-instruction interrupt handler by conditionally generating an illegal instruction. Or conditionally branch to the division-error handler by conditionally dividing by 0. Since you can set the address of that interrupt handler (e.g. by storing into Interrupt Vector/Descriptor Table on x86), you can conditionally branch to an arbitrary address using a store and a divide by 0. (All of your branch targets need to do whatever is necessary to re-enable that interrupt as being triggerable, though).
Let the program counter wrap around: On a flat-memory von-Neumann architecture machine, this requires self-modifying code to keep track of the program state, because there's no memory that doesn't execute as code.
In segmented x86 (e.g. 16-bit Real Mode) IP
will wrap around from 0xFFFF to 0x0000. So your loop is the whole code segment, and you have separate data storage in other segments. You can only modify CS
with a far jmp
or far call
/far ret
(or interrupt / interrupt-return), so you can't do anything like pop cs
to jump (because pop cs
doesn't exist, unlike the other pop-segment-register instructions).
Fun fact: The last instruction at the end of the code segment doesn't have to end at 0xFFFF. e.g. a multi-byte instruction could start at 0xFFFF and end at 0x04. It's possible to write x86 machine code that executes different ways if decoded from different start points, but this is not useful here. (This imposes huge constraints on what you can do).
ARM makes the program counter accessible as a normal general-purpose register, so you can do control-flow with sub r15, #1024
or something, not technically using a b
(branch) or any kind of call (bl
) or return instruction. You're still directly modifying the program counter, though, so it's really just a branch instruction. (pc
is another name for r15
).