It is a computer Architecture question which entails a long answer. I will try to keep it very simple on the risk of being inaccurate.
You basically self-answered your question by asking do CUDA core handle branch prediction, the answer is NO.
A CPU core has to handle each single operation a computer does, calculation, memory fetching, IO, interrupts, therefore it has a huge complex instruction set, and to optimize the speed of fetching instruction branch prediction is used.
Also it has a big cache and fast clock rate.
To implement the instruction set you need more logic thus more transistors more cost per core compared to the GPU.
The GPU cores have less cache memory, simpler instruction and less clock rate per clock, however they are optimized to do more calculation as a group.
The simple instructions set, the less cache memory makes them less expensive per core.