2

I'm trying to convert existing single thread flood fill algorithm(s) to multithread one(s).

Input: - 2d bit array and its dims - xy coords where fill should begin

Output: - same 2d bit array with updated bits

Problem: - only 1 thread at the time can write to given 64bits (8x8 pixels) in array, also no other thread can read this 64bits chunk at the write time

I've started with queue approach and thread pool so once thread finishes its job it can take another task from the queue.

How would you organize thread synchronization conforming 'problem' statement?

The main problem is how to assign read/write lockers to given memory chunk?

Anonymous
  • 2,122
  • 19
  • 26
  • Flood fill has never struck me as being amenable to multi threading. The next action is too dependent on previous results. Good luck. – Mark Ransom Dec 30 '14 at 16:57
  • Should it start at several coordinates, or at a single (x,y) one? – didierc Dec 30 '14 at 20:32
  • @didierc, just single x,y. It should populate quickly to many many points that need fill recursion. – Anonymous Dec 30 '14 at 20:36
  • You could divide the area in quadrants from the starting point, and have 4 threads working, one for each of them. – didierc Dec 30 '14 at 20:42
  • 1
    In fact, you should probably rethink the order in which new coordinates should be pushed to the queue so as to *group* coords in a way helping threads not stepping on each other's toes. – didierc Dec 30 '14 at 20:47
  • If you have one thread going consistently in one direction, it will push coords on the sides of the line it draws to the queue. Now each new thread may pick a chord, and trace in the orthogonal direction away from the original line, and never "hit" another thread line. Follow that process starting on both direction of the say X axis of the start position, and push valid coords above and below these lines, which new threads would trace vertically up and down. – didierc Dec 30 '14 at 20:56
  • For 8 directions flood fill, things get a bit more complicated though. – didierc Dec 30 '14 at 20:57
  • Regarding the limitation on bit blocks, simply group works by blocks, ie make threads work on blocks rather than pixels. – didierc Dec 30 '14 at 21:00
  • @didierc, heaving 4 direction specialized threads do the job pretty well (accepting limit of max 4 running threads at the time). Indeed block limited threads is something i'd like to switch to but it involves multiple points on neighbouring block boundary which must be filled. It complicates block thread input as it needs to accept multiple start points. – Anonymous Dec 30 '14 at 21:54

1 Answers1

2

Generally you want to divide the data as coarsely as possible and minimize communication between threads. Communication includes shared data structures, even the lock free ones. Especially the ones where there are shared variables with write access.

Above general "coarse" policy avoids the common pitfalls (for example false sharing) which prevent scaling.

As for your specific problem, I have to confess, I'm not intimate with flood fill algorithms so I'm not immediately able to sketch out a coarse approach.

However if a coarse approach is not feasible and a strategy for locking individual cells is needed, lock striping could be an approach worth investigating in this case.

A lock free implementation is another possibility. Maybe use compare-and-swap type operation to do the writes (InterlockedCompareExchange64 on VS) combined with retry logic if another thread wrote in the same 8x8 pixel 64bit block.

It could be possible to relax the read locking completely. If 2 threads end up painting the same pixels, it may only waste some cycles, but not corrupt the results.

A lock free implementation could be several times faster.

If you are working in Java Java Concurrency in Practice by Goetz is a great book on things like lock striping.

Community
  • 1
  • 1
Peter Lamberg
  • 8,151
  • 3
  • 55
  • 69
  • Thanx for feedback. So in your opinion this is going to kill parallel processing due to cache architecture, or you're giving me hint how to not fall into cache line limit problems? – Anonymous Dec 30 '14 at 15:21
  • I'll try to keep adding to my answer. I'm watching a 1yr old at the same time :D – Peter Lamberg Dec 30 '14 at 15:23
  • I was merely trying to warn about the nasty hard to see problems that may ruin otherwise clever parallelization strategies. I'll maybe try to reprhase my post... – Peter Lamberg Dec 30 '14 at 15:32
  • Thank you, im sketching this thing in c++ and win32 synchronization objects + interlocked ops for thread management. I'm getting into lock striping to see if suitable for me. PS: watch out diaper stack ovl :) – Anonymous Dec 30 '14 at 15:58
  • Please post your results here. Maybe a separate answer. – Peter Lamberg Dec 30 '14 at 16:45
  • Of course you could go lock free. Do every write with compare and swap intrinsic combined with retry logic. It is orders of magnitude faster as there are no thread management operations involved. – Peter Lamberg Dec 30 '14 at 18:57
  • I'll try to post it tomorrow, with bench results. Now i'm facing problem with out of array access :) – Anonymous Dec 30 '14 at 21:58