For an array of random numbers, a least significant digit first counting variation of radix sort is normally fastest for smaller arrays that fit within cache. While for larger arrays, using one most significant digit first to split up the array into smaller sub-arrays that fit in cache will be faster. Since the data is random, the main time overhead for a radix sort is the randomly distributed writes, which is not cache friendly if the array is significantly larger than cache. If the original and working arrays fit within cache, for most systems, the random access writes don't incur a significant time penalty.
There is also a choice for the base used in a radix sort. For example 32 bit numbers can be sorted in 4 passes if using base 256 (8 bit "digits"). Using base 65536 (16 bit "digits") usually exceeds the size of the L1 and/or L2 caches, so it's not faster in most cases, even though it only takes two passes. For 64 bit numbers, four 11 bit "digits" and two 10 bit "digits" could be used to sort in 6 passes, instead of using eight 8 bit "digits" to sort in 8 passes. However, the 11/10 bit digit variation isn't going to be faster unless the array is large enough and the distribution of the random numbers is uniform enough to use up most of the storage used to hold the counts / indexes.
Link to a prior thread about radix sort variations:
Radix Sort Optimization
merge sort "best case"
For a standard merge sort, the number of moves is always the same, but if the data is already sorted, only half the number of compares is done.
quick sort / intro sort "best case"
Best case for quick sort is random data. Using the middle value for partition doesn't make much difference for random data, but if the data is already sorted, it ends up as best case. Intro sort generally involves extra code to check if the recursion is getting too deep where it switches to heap sort. For random data, this shouldn't happen, so the extra code to check for switch to heap sort is just extra overhead.