Is it possible to come up with a distributed / multi core implementation of a prime sieve.

Question

I have been working on prime sieve algorithm, and the basic implementation is working fine for me. What I am currently struggling with is a way to divide and distribute the calculation on to multiple processors.

I know it would require storage of the actual sieve in a shared memory area or a text file, but how would one go about dividing the calculation related steps.

Any lead would help. Thanks!

Maybe each node could hold part of the sieve, and a central node could tell each one "now cross out all multiples of 3"? The problem is that this algorithm is all about manipulating one central data structure, nothing special happens computationally apart from loops. — RemcoGerlich, Mar 06 '14 at 11:41
You don't need to search the internet for guidance, even little ol' SO has possible avenues to explore, e.g. http://stackoverflow.com/questions/18601478/parallel-sieve-of-eratosthenes-java-multithreading — High Performance Mark, Mar 06 '14 at 11:44
Right, but the algo will still have to proceed in a linear way, I cannot assign crossing out of multiples of 4 while some node is still processing multiples of three? Or can I? — , Mar 06 '14 at 11:44
you can. the question is, what's the point? sieving is not efficient. why would you do it? — Karoly Horvath, Mar 06 '14 at 11:55

Bernhard Barker · Accepted Answer · 2014-03-06T12:14:37.737

Split the numbers into sections of equal size, each processor will be responsible for one of these sections.

Another processor (or one of the processors) will generate the numbers of which multiple needs to be crossed-off. And pass this number to each other processors.

Each of the processors will then use the remainder of the section size divided by the given number and its own section index to determine the offset into its own section, and then loop through and cross off the applicable numbers.

Alternatively, one could get a much simpler approach by just using shared memory.

Let the first processor start crossing off multiple of 2, the second multiples of 3, the third multiples of 5, etc.

Essentially just let each processor grab the next number from the array and run with it.

If you don't do this well, you may end up with the third crossing off multiples of 4, since the first didn't get to 4 yet when the third started, so it's not crossed off, but it shouldn't result in too much more work - it will take increasingly longer for a multiple of some prime to be grabbed by a processor, while it will always be the first value crossed off by a processor handling that prime, so the likelihood of this redundancy happening decreases very quickly.

Using shared memory like this tends to be risky - if you plan on using one bit per index, most languages don't allow you to work on that level, and you'll end up needing to do some bitwise operations (probably bitwise-AND) on a few bytes to make your desired changes (although this complexity might be hidden in some API), and many languages will also not have this operation be a so-called atomic operation, meaning one thread can get a value, AND it, and write it back, and another can come in and get the value before the first thread wrote it, AND it, and write it back after the first thread's write, essentially causing the first thread's changes to be lost. There's no simple, efficient fix for this - what exactly you need to do will depend on the language.

The shared memory thing will probably become a bottle neck as that way I will not be able to split the problem to multiple computers — , Mar 06 '14 at 11:53
if you use CPU-words(eg: 8 byte) per entry, then yeah, it's not risky, but storing the sieve that way is ridiculously expensive. — Karoly Horvath, Mar 06 '14 at 11:58
@KarolyHorvath Of course - I didn't think about it that. Amended my answer appropriately. Thanks. — Bernhard Barker, Mar 06 '14 at 12:11
@user3386925 That's certainly true. If each computer has multiple cores, it might be worth combining both approaches (on the set of computers you do the first approach, but each computer does the second approach), but note that the second approach is a lot more risky than I first thought and is probably best left alone, unless you're skilled at concurrent programming, or you'd be happy learning quite a bit about it / dealing with some problems related to it. — Bernhard Barker, Mar 06 '14 at 12:16

Is it possible to come up with a distributed / multi core implementation of a prime sieve.

1 Answers1