As a starting point I'd suggest OpenMP. With this you can very simply do three basic types of parallelism: loops, sections, and tasks.
Parallel loops
These allow you to split loop iterations over multiple threads. For instance:
#pragma omp parallel for
for (int i=0; i<N; i++) {...}
If you were using two threads, then the first thread would perform the first half of the iteration. The second thread would perform the second half.
Sections
These allow you to statically partition the work over multiple threads. This is useful when there is obvious work that can be performed in parallel. However, it's not a very flexible approach.
#pragma omp parallel sections
{
#pragma omp section
{...}
#pragma omp section
{...}
}
Tasks
Tasks are the most flexible approach. These are created dynamically and their execution is performed asynchronously, either by the thread that created them, or by another thread.
#pragma omp task
{...}
Advantages
OpenMP has several things going for it.
Directive-based: the compiler does the work of creating and synchronizing the threads.
Incremental parallelism: you can focus on just the region of code that you need to parallelise.
One source base for serial and parallel code: The OpenMP directives are only recognized by the compiler when you run it with a flag (-fopenmp
for gcc). So you can use the same source base to generate both serial and parallel code. This means you can turn off the flag to see if you get the same result from the serial version of the code or not. That way you can isolate parallelism errors from errors in the algorithm.
You can find the entire OpenMP spec at http://www.openmp.org/