My understanding is that for good performance, you want to make your hash tables large enough that chaining to another bucket is rare. The rarer it is, the less it matters that the branch will usually mispredict (because branch prediction learns from the common cases that branching avoids the path that leads to chaining to the next bucket).
A good hash table will choose its chaining algorithm to distribute well, avoiding big buildups of long chains until the hash table is very nearly full. (If we're talking about "stealing" another bucket in the main table, rather than building a linked list for this entry. Linked lists are relatively slow to traverse / linear search because of the data dependency for following the next
pointer.)
Also note that a cache-miss (or even L3 cache hit) may cost more than a branch miss, on modern CPUs with fast recovery that doesn't have to wait for the out-of-order back-end to drain before starting recovery. (What exactly happens when a skylake CPU mispredicts a branch?
Chaining to the adjacent bucket might not be optimal; I forget what the state of the art is. IIRC it's not because one chain creates a dense group of used-up buckets, so any other entries for those buckets would also have to chain, making the problem worse. You don't want to do more scattered memory fetches than you need.
Otherwise, if you were just chaining to adjacent entries in the array you're using as a hash table, then yes, branchlessly fetching some adjacent keys (from the same or next cache line) could be fine. If the entries are small enough, checking them in parallel with SIMD could possibly even be worth it. At least for integer keys or something else that can be compared efficiently; it wouldn't be worth doing 4 string-compares in parallel.
So yes, cache misses are bad. But if most of your lookups find their hit in the first bucket, then that's an easy pattern for branch prediction. In many use-cases for hash tables, the majority of lookups will be for existing entries that get used repeatedly.
More sophisticated branch prediction (like Intel Haswell and later's IT-TAGE) use recent branch history to form an index into the table of branch-prediction entries, so different branching leading to different kinds of lookups can use different predictions for the same branch in the same hash table code. So e.g. one function that typically finds entries not-present might have its hash lookups predict correctly (if they don't chain), while another function that looks up known stuff can also predict that correctly.
Still, in the case where your hash table is starting to fill up and you have a non-negligible amount of chaining, branch mispredicts would be something to worry about. (And measure with hardware performance counters, e.g. perf stat
or perf record -e branch-misses ./my_hashtable_benchmark
; perf report
on GNU/Linux.)