I am a newbie, trying to edit a program. I have a MPI Program that divide array into subsets, the master sends the subsets to the slaves, they doo a quicksort and then return the sorted numbers to the master so he can write them in a file. What I am trying to do is make the quick sort happen even quicker. My idea is to make the master divide the array and sends subsets to the slaves but keeping one for himself. Then dividing them again to new subsets (for example if we have numbers from 1 to 100 in the array the new subsets should be from 1 to 25, 26 to 50, 51 to 75 and 76 to 100) and then keep the first subset (1 to 25) for himself, send the second (26 to 50) to the first slave, the third one (51 to 76) to the second slave and etc. The slaves should do the same. Then it should perform a quicksort and the slave should return the sorted numbers to the master. I am hoping that this way the sort should be faster. The problem is that as I said I am a newbie and I need help with ideas, advices and even code so I can achieve my goal.
-
Are you really trying to solve an actual problem where sorting is too slow? Then forget this approach and read about [parallel sorting algorithms](http://stackoverflow.com/a/3969847/620382), better yet think/ask about how you can keep your data distributed in the first place. Or is this an exercise and you are intending to learn about how to use MPI? – Zulan Mar 27 '16 at 16:32
-
It's simply an excercise, so I can learn some MPI. – Alex Alex Mar 27 '16 at 16:55
-
Any idea how to make the slave dividing one more time the sub-array in range? For example if the numbers in the code are from 0 to 100 I would need them divided in 4 groups (1 master + 3 slaves) in ranges: 0-25/26-50/51-75/76-100. – Alex Alex Apr 03 '16 at 16:27
2 Answers
For this answer I am going to stick with the assumption that this should be done with Quicksort, and that the data is read on a single process. Just keep in mind that there are many sophisticated parallel sorting techniques.
Your idea of separating the numbers by subsets is problematic, because it makes assumptions about the shape of data. For non-uniformly distributed data sets it won't even help to know the minimum and maximum. It is better to simply send out equal amount of numbers to each process, let them sort and afterwards merge the data.
For the merge you start with ntasks
sorted sub-lists and want to end up with a single one. A naive merge would repeatedly look for the minimal element in each sub-list, remove that and append it to the final list. This needs ntasks
* N
comparisons, N
swaps and N * 2
memory. You can optimize the comparisons to log2(ntasks) * N
by doing an actual merge sort, but that also needs log2(ntasks) * N
swaps. You can further refine that by keeping the sub-lists (or pointers to their first element) in a priority queue, which should give you log2(ntasks) * N
comparisons and N
swaps.
About the usage of MPI:
- Do not use
MPI_Isend
&MPI_Wait
right after each other. In this case useMPI_Send
instead. Use the immediate variants only if you can actually do something useful between theMPI_Isend
andMPI_Wait
. - Use collective operations whenever possible. To distribute data from the root to all slaves, use
MPI_Scatter
orMPI_Scatterv
. The first requires all ranks to receive the same number of elements, which can also be achieved by padding. To collect data from the slaves in the master, useMPI_Gather
orMPI_Gatherv
.1 Collectives are more easy to get right, because they describe the high level operation. Their implementation is usually highly optimized. - To receive an unknown-size message, you can also send the message directly and use
MPI_Probe
at the receiver side to determine the size. You are even allowed toMPI_Recv
with a buffer that is larger than the sent buffer, if you know an upper bound.
1 You could also consider the merge step as a reduction and parallelize the necessary computation for that.
-
Any idea how to make the slave dividing one more time the sub-array in range? For example if the numbers in the code are from 0 to 100 I would need them divided in 4 groups (1 master + 3 slaves) in ranges: 0-25/26-50/51-75/76-100. – Alex Alex Apr 03 '16 at 16:27
-
I'm afraid I don't really understand the question. Dividing itself is rather easy: `rank = (value - min) * num_ranks / (1 + max - min)`. – Zulan Apr 03 '16 at 16:32
-
Right now the master is dividing the array and sending the subarrays to the slaves keeping one for itself. What I have no idea how to do is to make the slave does another dividing of the subarray he gets, but it needs to be done in some ranges depending on the range of the numbers in the opened file. If the slave receives the numbers 1,27,33,45,50,71,23,5,99,88 it should group them in groups (1,23,5),(27,33,45,50), (71) and (99,88). All slaves and the master should do that and then the first group should be send to the master, the second group should be send to the first slave and etc. – Alex Alex Apr 03 '16 at 16:42
-
Why would the second division be different / more difficult than the first? And why would you wan to do that? – Zulan Apr 03 '16 at 17:15
-
So the master and the slaves should have numbers in one range that they can sort faster and then when I send them back to the master he will get them already sorted and sort will not be needed. – Alex Alex Apr 03 '16 at 17:23
-
I explained in my answer why it is generally a bad idea to split the values by range in the first place and what you can do instead. – Zulan Apr 03 '16 at 17:29
In principle your solution looks very good. I don't understand completely if for the larger files you are intending to process them in chunks or as a whole. From my experience I suggest that you assign as large as possible blocks to the slaves. This way the rather expensive message passing operations are executed only very seldom.
What I cannot understand in your question is what the overall goal of your program is. Is it your intention to sort the complete input files in parallel? If this is the case you will need some sort of merge sort to be applied to the results you receive from the individual processes.

- 1,786
- 11
- 24
-
Any idea how to make the slave dividing one more time the sub-array in range? For example if the numbers in the code are from 0 to 100 I would need them divided in 4 groups (1 master + 3 slaves) in ranges: 0-25/26-50/51-75/76-100. – Alex Alex Apr 03 '16 at 16:27