0

My program needs to scatter a matrix between processes. The matrix is represented in memory by a 1d array. In my first version I scattered the matrix between processes by rows. The processes sends to each others some rows of their local matrixes in order to calculate the result of the computations that each one needs to make in a proper way. The processes send these rows with the sendrecv function. Till that all works good.

Now it came up to my mind that if the matrix has much more columns than rows it will be a better idea to scatter the matrix by columns instead of rows in order to have less elements of the local matrixes to be sent by the processes and in this way improving the scalability of the program. The thing is...how can I scatter the matrix by columns? And then...how can I select the proper columns to be sent by the processes to each others?

user73793
  • 151
  • 8

1 Answers1

1

If possible, try changing your 1d array from row major order to column major order, scatter it and perform the computation, recieve it and then change it back from column major order to row major order. Depending on your matrix, the cost of to and fro transformation might be greater than the savings obtained from parallelization along the columns. See boost::multi_array documentation ( http://www.boost.org/doc/libs/1_55_0b1/libs/multi_array/doc/user.html#sec_storage)

eswarp25
  • 155
  • 1
  • 1
  • 9
  • Oh yes...I can save it directly in column major order without the transformation so this is good.I have to think about my specific algorithm now...if it works or not. Anyway thanks. – user73793 Jun 03 '14 at 20:33
  • I managed implementing what I wanted. The thing is now that it seems to me that the scalability decreased. The only thing I do is creating another function similar to the first one. I check if the number of rows is higher that the number of columns and in this case my matrix is stored in row major order and I run the algorithm as before. Otherwise I save the matrix in column major order and run a similar (same but dual) algorithm. Is it possible that just adding an if and chosing with function to be run it decreases the performances and scalability? This looks to me quiet weird... – user73793 Jun 04 '14 at 19:46
  • @user73793: what does your algorithm do? Does it perform calculations like sum over a row or a column? Does it only work with individual values rather than rows or columns? I cannot say much with out knowing anything about your algorithm. – eswarp25 Jun 04 '14 at 19:53
  • It computes the 2d convolution between an input matrix and a kernel so the computation goes through all the elements of the input matrix. If you want I can send you my code. – user73793 Jun 04 '14 at 19:58
  • @user73793: If the code is not too large, edit the question and paste the code. Otherwise use a service like pastebin.com and provide the link. – eswarp25 Jun 04 '14 at 20:04
  • @user73793: if the algorithm is a convolution, the decrease in speed up might be due to decrease in cache locality of data when stored in column major order. As far as I know, most x86 and amd64 processors use row major order natively to store 2d arrays. – eswarp25 Jun 04 '14 at 20:09
  • You can find my code on pastebin.com. I'm user rdb987. There are three files relative to my project – user73793 Jun 04 '14 at 20:18