I am on a mission of sorting somewhat large array of unsigned, 64-bit, randomly generated integers (over 5E7 elements). Can you direct me to a parallel sorting algorithm that might exhibit almost linear speedup at least in the case of random data?
I am working with Java, in case it makes any difference with regard to fast sorting.
Edit: Note that this question is primarily concerned with parallel sorts capable to achieve near-linear speedup. (Meaning, when the amount of executing cores grows from P to 2P, the time spent by a parallel sort drops to 55 - 50 percent of the computation performed on P cores.)