Can't provide a C++11 specific answer since we're still mostly using pthreads. But, as a language-agnostic answer, you parallelise something by setting it up to run in a separate function (the thread function).
In other words, you have a function like:
def processArraySegment (threadData):
arrayAddr = threadData->arrayAddr
startIdx = threadData->startIdx
endIdx = threadData->endIdx
for i = startIdx to endIdx:
doSomethingWith (arrayAddr[i])
exitThread()
and, in your main code, you can process the array in two chunks:
int xyzzy[100]
threadData->arrayAddr = xyzzy
threadData->startIdx = 0
threadData->endIdx = 49
threadData->done = false
tid1 = startThread (processArraySegment, threadData)
// caveat coder: see below.
threadData->arrayAddr = xyzzy
threadData->startIdx = 50
threadData->endIdx = 99
threadData->done = false
tid2 = startThread (processArraySegment, threadData)
waitForThreadExit (tid1)
waitForThreadExit (tid2)
(keeping in mind the caveat that you should ensure thread 1 has loaded the data into its local storage before the main thread starts modifying it for thread 2, possibly with a mutex or by using an array of structures, one per thread).
In other words, it's rarely a simple matter of just modifying a for
loop so that it runs in parallel, though that would be nice, something like:
for {threads=10} ({i} = 0; {i} < ARR_SZ; {i}++)
array[{i}] = array[{i}] + 1;
Instead, it requires a bit of rearranging your code to take advantage of threads.
And, of course, you have to ensure that it makes sense for the data to be processed in parallel. If you're setting each array element to the previous one plus 1, no amount of parallel processing will help, simply because you have to wait for the previous element to be modified first.
This particular example above simply uses an argument passed to the thread function to specify which part of the array it should process. The thread function itself contains the loop to do the work.