2

I have been working on different variations of the Radix Sort. At first I used chaining, which was really slow. Then I moved onto using a count sort while using val % (10 * pass), and most recently turning it into the respective bytes and count sorting those, which allows me to sort by negative values also.

I wanted to try it with multithreading, and can only get it to work about half the time. I was wondering if someone can help look at my code, and see where I'm going wrong with the threading. I have each thread count sort each byte. Thanks:

public class radixSort {

    public int[] array;
    public int arraySize, arrayRange;
    public radixSort (int[] array, int size, int range) {
        this.array = array;
        this.arraySize = size;
        this.arrayRange = range;
    }
    public int[] RadixSort() {
        Thread[] threads = new Thread[4];
        for (int i=0;i<4;i++)
            threads[i] = new Thread(new Radix(arraySize, i));
        for (int i=0;i<4;i++)
            threads[i].start();
        for (int i=0;i<4;i++)
            try {
                threads[i].join();
            } catch (InterruptedException e) {
                e.printStackTrace();
            }
        return array;
    }
    class Radix implements Runnable {
        private int pass, size;
        private int[] tempArray, freqArray;
        public Radix(int size, int pass) {
            this.pass = pass;
            this.size = size;
            this.tempArray = new int[size];
            this.freqArray = new int[256];
        }
        public void run() {
            int temp, i, j;
            synchronized(array) {
                for (i=0;i<size;i++) {
                    if (array[i] <= 0) temp = array[i] ^ 0x80000000;
                    else temp = array[i] ^ ((array[i] >> 31) | 0x80000000);
                    j = temp >> (pass << 3) & 0xFF;
                    freqArray[j]++;
                }
                for (i=1;i<256;i++)
                    freqArray[i] += freqArray[i-1];
                for (i=size-1;i>=0;i--) {
                    if (array[i] <= 0) temp = array[i] ^ 0x80000000;
                    else temp = array[i] ^ ((array[i] >> 31) | 0x80000000);
                    j = temp >> (pass << 3) & 0xFF;
                    tempArray[--freqArray[j]] = array[i];
                }
                for (i=0;i<size;i++)
                    array[i] = tempArray[i];
            }
        }
    }
}
Kevin Melkowski
  • 463
  • 1
  • 5
  • 17

3 Answers3

2

There is a basic problem with this approach. To get a benefit from multithreading, you need to give each thread a non-overlapping task compared to the other treads. By synchonizing on the array you have made it so only one thread does work at a time, meaning you get all the overhead of threads with none of the benefit.

Think of ways to partition the task so that threads work in parallel. For example, after the first pass, all the item with a 1 high bit will be in one part of the array, and those with a zero high-bit will be in the other. You could have one thread work on each part of the array without synchronizing.

Note that your runnable has to completely change so that it does one pass at a specified subset of the array then spawns threads for the next pass.

Mel Nicholson
  • 3,225
  • 14
  • 24
  • I see on the synchronized part, I just read somewhere to use the same data structure, I'd have to use that. I think what you mean by partitioning is splitting it in half'ish, and radix sorting the negatives and positives, then merging the two together. I've changed the values of the bytes, in order to work my way around this. I'm just wondering if instead of doing the 4 passes, they could be done at the same time. – Kevin Melkowski Dec 17 '12 at 20:52
  • Actually, you would not need to merge. Your thread constructor would take the arguments `startIndex, endIndex, bit` -- the thread would sort only the part of the array assigned to it for the assigned bit, then spawn a new thread to sort each subset for the next bit. Since the subset are already in the right order, the total list is also. The reason you do not need to synchronize in that case is that not two threads will look at the same value. Optimization: let the current thread do one of the halves rather than spawning two, and abort any subset with size<2 – Mel Nicholson Dec 17 '12 at 21:03
0

Besides wrong class and method names (class should start with capital letter, method shouldn't), I can see that you are synchronizing all thread works on the array. So it's in fact not parallel at all.

Jan Krakora
  • 2,500
  • 3
  • 25
  • 52
0

I am pretty sure that you can't really parallelize RadixSort, at least in the way you are trying to. Someone pointed out that you can do it by divide-and-conquer, as you first order by the highest bits, but in fact, RadixSort works by comparing the lower-order bits first, so you can't really divide-and-conquer. The array can basically be completely permuted after each pass.

Guys, prove me wrong, but i think it's inherently impossible to parallelize this algorithm like you try to. Maybe you can parallelize the (count) sorting that is done inside of each pass, but be aware that ++ is not an atomic operation.

kutschkem
  • 7,826
  • 3
  • 21
  • 56
  • This is wrong. RadixSort works fine starting with the high-order bits. – Mel Nicholson Dec 17 '12 at 21:10
  • You would do divide-and-conquer in that case, right? Ok, i can see how it would work, but why the heck is it then described so complicated in, e.g. "Introduction to Algorithms"? I mean, this means you would not necessarily need your underlying sorting algorithm to be stable, do you? Is there some hidden catch about the theoretical runtime? – kutschkem Dec 17 '12 at 21:17
  • I never did like the Intro to Aglo's book, they make everything way too complicated. – Kevin Melkowski Dec 17 '12 at 21:26