But if the CPU predicts the branch above incorrectly, it would divide
by zero in the following instructions. This doesn't happen though, and
I was wondering why?
It may well happen, however the question is: Is it observable? Obviously, this speculative division by zero does not and should not "crash" the CPU, but this does not even happen for a non-speculative division by zero. There is a long causal chain between the division by zero and your process exiting with an error message. It goes somewhat like this (on POSIX, x86):
- The ALU or the microcode responsible for the division flags the division by zero as an error.
- The interrupt descriptor #0 is loaded (int 0 signifies a division by zero error on x86).
- A set of registers (including the current program counter) is pushed onto the stack. The corresponding cache lines may need to be fetched first from RAM.
- The interrupt handler is executed (a piece of kernel code). It raises a SIGFPE signal in the current process.
- Eventually, signal handling decides that the default action shall be taken (assuming you didn't install a handler), which is to display an error message and terminate the process.
- This takes many additional steps (e.g. use of device drivers) until eventually there is a change observable by the user, namely some graphics output by memory-mapped I/O.
This is a lot of work, compared to a simple, error-free division, and a lot of it could be executed speculatively. Basically anything until the actual mmap'ed I/O, or until the finite set of resources for speculative execution (e.g. shadow registers and temporary cache lines) are exhausted. The latter is likely to happen much, much sooner. In this case, the speculative branch needs to be suspended, until it is clear whether it is actually taken and the changes should be committed (once the changes are written, the speculative execution resources can then be released), or whether the changes should be discarded.
The important bit is: As long as none of the speculative execution state becomes visible to other threads, other speculative branches on the same thread, or other hardware (such as graphics), anything goes for optimization. However, realistically, MSalters is absolutely right that a CPU designer would not care to optimize for this use case. So it is equally my opinion, that a real CPU will probably just suspend the speculative branch once the error flag is set. This at most costs a few cycles if the error is even legitimate, and even that is unlikely because the pattern you described is common. Doing speculative execution past this point would only divert precious optimization resources from more important cases.
(In fact, the only processor exception I would want to make reasonably fast, were I a CPU designer, is a specific type of page fault, where the page is known and accessible, but the "present" flag is cleared, just because this happens commonly when virtual memory is used, and is not a true error. Even this case is not terribly important, though, because the disk access on swapping, or even just memory decompression, is typically much more expensive.)