I am developing codes for the scientific computing community particularly for solving linear system of equations (Ax=b form) iteratively.
I have used BLAS and LAPACK for primitive matrix subroutines but I now realize that there is some scope for manual parallelization. I am working on a Shared Memory system which leaves me with 2 choices: OpenMP and PThreads.
Assuming that time isn't the greatest factor (& performance of the code is), which is a better, future proof and maybe, portable (to CUDA) way of parallelizing? Is the time spent in using Pthreads worth the performance boost?
I believe that my application (which basically deals with starting many things off at once and then operating upon the "best" value from all of them), will benefit from explicit thread control but I'm afraid the coding will take up too much time and at the end there will be no performance pay off.
I have already looked at few of the similar questions here but they are all pertaining to general applications.
This one is concerning a generic multithreaded application in Linux.
This is a general question as well.
I am aware of SciComp.SE but felt it was more on topic here.