Using Grand Central Dispatch in Swift to parallelize and speed up “for" loops?

Question

I am trying to wrap my head around how to use GCD to parallelize and speed up Monte Carlo simulations. Most/all simple examples are presented for Objective C and I really need a simple example for Swift since Swift is my first “real” programming language.

The minimal working version of a monte carlo simulation in Swift would be something like this:

import Foundation

import Cocoa
var winner = 0
var j = 0
var i = 0
var chance = 0
var points = 0
for j=1;j<1000001;++j{
    var ability = 500

    var player1points = 0

    for i=1;i<1000;++i{
        chance = Int(arc4random_uniform(1001))
        if chance<(ability-points) {++points}
        else{points = points - 1}
    }
    if points > 0{++winner}
}
    println(winner)

The code works directly pasted into a command line program project in xcode 6.1

The innermost loop cannot be parallelized because the new value of variable “points” is used in the next loop. But the outermost just run the innermost simulation 1000000 times and tally up the results and should be an ideal candidate for parallelization.

So my question is how to use GCD to parallelize the outermost for loop?

score 5 · Accepted Answer · edited May 23 '17 at 11:53

5

A "multi-threaded iteration" can be done with dispatch_apply():

let outerCount = 100    // # of concurrent block iterations
let innerCount = 10000  // # of iterations within each block

let the_queue = dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0);
dispatch_apply(UInt(outerCount), the_queue) { outerIdx -> Void in
    for innerIdx in 1 ... innerCount {
       // ...
    }
}

(You have to figure out the best relation between outer and inner counts.)

There are two things to notice:

arc4random() uses an internal mutex, which makes it extremely slow when called from several threads in parallel, see Performance of concurrent code using dispatch_group_async is MUCH slower than single-threaded version. From the answers given there, rand_r() (with separate seeds for each thread) seems to be faster alternative.
The result variable winner must not be modified from multiple threads simultaneously. You can use an array instead where each thread updates its own element, and the results are added afterwards. A thread-safe method has been described in https://stackoverflow.com/a/26790019/1187415.

Then it would roughly look like this:

let outerCount = 100     // # of concurrent block iterations
let innerCount = 10000   // # of iterations within each block

let the_queue = dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0);

var winners = [Int](count: outerCount, repeatedValue: 0)
winners.withUnsafeMutableBufferPointer { winnersPtr -> Void in

    dispatch_apply(UInt(outerCount), the_queue) { outerIdx -> Void in
        var seed = arc4random() // seed for rand_r() in this "thread"

        for innerIdx in 1 ... innerCount {
            var points = 0
            var ability = 500

            for i in 1 ... 1000 {
                let chance = Int(rand_r(&seed) % 1001)
                if chance < (ability-points) { ++points }
                else {points = points - 1}
            }
            if points > 0 {
                winnersPtr[Int(outerIdx)] += 1
            }
        }
    }
}

// Add results:
let winner = reduce(winners, 0, +)
println(winner)

edited May 23 '17 at 11:53

Community

1
1

answered Nov 24 '14 at 15:02

Martin R

529,903
94
1,240
1,382

Didn't know that about arc4random()! Be careful of modifying arrays from multiple threads - see this Q&A for more: http://stackoverflow.com/questions/26693838/swift-process-array-in-parallel-using-gcd/26790019#26790019 – Nate Cook Nov 24 '14 at 15:09
I think my second answer (not the accepted one) to that Q is the Swiftiest, since it explicitly gets a contiguous memory buffer for the array. – Nate Cook Nov 24 '14 at 15:10
1

@NateCook: I already had the feeling that I had seen something like that, but could not find it. Thanks for the reference and your solution (+1), I have updated the answer accordingly. – Martin R Nov 24 '14 at 15:20
Thanks. Especially for using my very simple example. This way I might have a chance of understanding what is going on. Can´t wait to try it later today. One question though: You are splitting the 1000000 “loops” into chunks before queueing them. Is that to avoid too much overhead? In the code I am actually using the innermost loop has a lot more manipulation of many variables. Would that effect how to divide the original 1000000 loops or is the more important factor how many “chunks” you feed to the queue? – user2523167 Nov 24 '14 at 15:50
@user2523167: See [dispatch_apply(3)](https://developer.apple.com/library/mac/Documentation/Darwin/Reference/ManPages/man3/dispatch_apply.3.html): *"Sometimes, when the block passed to dispatch_apply() is simple, the use of striding can tune performance. Calculating the optimal stride is best left to experimentation. ..."* – Martin R Nov 24 '14 at 15:55
Can someone please show the Swift 3 version of this? It's really confusing because you can't tell a particular queue to do performConcurrent and none of my code is working. – Richard Birkett Dec 18 '16 at 17:01
1

@RichardBirkett: Yes, I wondered about that as well. Have a look at the comments following this answer http://stackoverflow.com/a/39949292/1187415. – Martin R Dec 18 '16 at 19:46
@MartinR Thanks! So it should just work fine, and it will choose its queue totally independently of what queue it's sent in. So if I don't want the concurrent process to block the thread I can send it async to a queue of any priority and the contents will still be on the priority queue? Either way my code still isn't working :( – Richard Birkett Dec 18 '16 at 20:21

Rob · Answer 2 · 2020-05-10T05:51:19.790

Just to update this for contemporary syntax, we now use concurrentPerform (which replaces dispatch_apply).

So we can replace

for j in 0 ..< 1_000_000 {
    for i in 0 ..< 1000 {
        ...
    }
}

With

DispatchQueue.concurrentPerform(1_000_000) { j in
    for i in 0 ..< 1000 {
        ...
    }
}

Note, parallelizing introduces a little overhead, in both the basic GCD dispatch mechanism, as well as the synchronization of the results. If you had 32 iterations in your parallel loop this would be inconsequential, but you have a million iterations, and it will start to add up.

We generally solve this by “striding”: Rather than parallelizing 1 million iterations, you might only do 100 parallel iterations, doing 10,000 iterations each. E.g. something like:

let totalIterations = 1_000_000
let stride = 10_000
let (quotient, remainder) = totalIterations.quotientAndRemainder(dividingBy: stride)
let iterations = quotient + remainder == 0 ? 0 : 1

DispatchQueue.concurrentPerform(iterations: iterations) { iteration in
    for j in iteration * stride ..< min(totalIterations, (iteration + 1) * stride) {
        for i in 0 ..< 1000 {
            ...
        }
    }
}

Using Grand Central Dispatch in Swift to parallelize and speed up “for" loops?

2 Answers2

Linked