C - Sieve of Eratosthenes - BitField

Question

I'm about to implement the Sieve of Eratosthenes and have a general question regarding the sieve-array.

I've implemented the sieve quite a few times now (in C) and always used an array of uint8_t (out of <stdint.h>) as the sieve. This is pretty memory inefficient, since 8 bits are used for each number to sieve, even though one bit should be sufficient.

How would I go about this in C? I need an array of bits. I could pretty much create an array of any type (uint8_t, uint16_t, uint32_t, uint64_t) and access the single bits with bit masks, etc.

What data type should I prefer and what operations should I use to access the bits without performance loss?

PS: I don't think this is a duplicate of just a BitArray implementation, since it the question is specific about the Sieve of Eratosthenes, since it's main nature needs to be efficient (not only in memory usage, but in access). I was thinking, that maybe different tricks can be used to make the sieving process more efficient...

I would say that the link with efficient implementations of the sieve of Eratosthenes is enough to make this *not* a duplicate of the other question. This strikes me as an interesting question which should be reopened. — John Coleman, Jun 27 '16 at 17:59
It can be made even more memory-efficient by ignoring even numbers in the sieve (since only `2` is prime), so 8 bits can represent the status of 16 numbers. — Weather Vane, Jun 27 '16 at 18:16
@JohnColeman The mention of the sieve is pretty much irrelevant: the question would still make sense (with one minor edit) if the first paragraph was deleted. Even if it is not a duplicate, it should probably be closed as Too Broad or Off Topic (no code, no research effort shown on the actual problem). — RoadieRich, Jun 27 '16 at 18:24
Following the question edit, perhaps on CodeReview - with the code ;) — Weather Vane, Jun 27 '16 at 18:28
@WeatherVane I was thinking of that as well. Thank you :) I think with some clever tricks it is possible to leave multiples of 3 and 5 as well. The question is where to stop/where the tricks become more complicated/hard to compute than the actual calculation. — Matthias, Jun 27 '16 at 18:28
I have tried that with just 2, 3 but it makes the algorithm quite a bit more complex, with the varying steps you have to take through the sieve. A 16-fold improvement is worth implementing, if that makes a difference between not/or enough memory available. In one problem I solved, there was still nowhere near enough memory, until I realised that the *range* of interest was containable, as well as the *square root* of the maximum number on test, so I implemented 2 arrays, being discontinuous. — Weather Vane, Jun 27 '16 at 18:32
@WeatherVane I can't follow all the things you say. What do you mean by a *16-fold improvement* and *the range of interest*? — Matthias, Jun 27 '16 at 18:38
That means where you used one number status per byte of storage, my suggestion puts 16 numbers in one byte of storage - rather than the 8 in your question. By *range of interest* I mean that the range of numbers to be considered in the question I was working on, would fit a memory array, although there was no chance to fit the *entire* sieve in memory. — Weather Vane, Jun 27 '16 at 18:40
Using a mod-30 wheel (skipping 2, 3, and 5) is pretty common, as there are 8 candidates per mod, which means they fit nicely in a byte. You div 30 to get which byte, then index by mod 30 result to get the bit. There are a few mod-210 and mod-2310 implementations, but it's a lot more work for not a lot of benefit. — DanaJ, Jun 27 '16 at 20:12

dbush · Accepted Answer · 2016-06-28T19:10:56.290

As mentioned by Weather Vane in his comments, you can save additional space by only considering every other number, since all even number except 2 are non-prime.

So in your bit array, each bit represents an odd number.

Here's an implementation I did a few years back using this technique.

#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <time.h>
#include <math.h>
#include <stdint.h>

uint8_t *num;
int count = 0;
FILE *primefile;

int main(int argc, char *argv[])
{
  int i,j,root;
  time_t t;

  if (argc>1) count=atoi(argv[1]);
  if (count < 100) {
    fprintf(stderr,"Invalid number\n");
    exit(1);
  }
  if ((num=calloc(count/16,1))==NULL) {
    perror("calloc failed");
    exit(1);
  }
  if ((primefile=fopen("primes.dat","w"))==NULL) {
    perror("Coundn't open primes.dat");
    exit(1);
  }
  t=time(NULL);
  printf("Start:\t%s",ctime(&t));
  root=floor(sqrt(count));
  // write 2 to the output file
  i=2;
  if (fwrite(&i,sizeof(i),1,primefile)==0) {
    perror("Couldn't write to primes.dat");
  }
  // process larger numbers
  for (i=3;i<count;i+=2) {
    if ((num[i>>4] & (1<<((i>>1)&7)))!=0) continue;
    if (fwrite(&i,sizeof(i),1,primefile)==0) {
      perror("Couldn't write to primes.dat");
    }
    if (i<root) {
      for (j=3*i;j<count;j+=2*i) {
        num[j>>4]|=(1<<((j>>1)&7));
      }
    }
  }
  t=time(NULL);
  printf("End:\t%s",ctime(&t));
  fclose(primefile);
  return 0;
}

Here, num is the bit array, allocated dynamically based on the upper bound of your search. So if you were looking for all primes up to 1000000000 (1 billion), it uses 64000000 (64 million) bytes of memory.

The key expressions are as follows:

For a "normal" bit array:

Set bit i:

num[i>>3] |= (1<<(i&7);
// same as num[i/8] |= (1<<((i%8));

Check bit i:

(num[i>>3] & (1<<(i&7))) != 0
// same as (num[i/8] & (1<<(i%8))) != 0

Since we're only keeping track of every other number, we divide i by 2 (or equivalently, shift right by 1:

num[i>>4] |= (1<<((i>>1)&7);
// same as num[(i/2)/8] |= (1<<(((i/2)%8));

(num[i>>4] & (1<<((i>>1)&7))) != 0
// same as (num[(i/2)/8] & (1<<((i/2)%8))) != 0

In the above code, there are some micro-optimizations where division and modulus by powers of 2 are replaced with bit shifts and bitwise-AND masks, but most compilers should do that for you.

Thank you really much for the answer. I'll report back, when I've implemented the sieve. — Matthias, Jun 27 '16 at 20:11
`if ((num=calloc(count/16,1))==NULL) {` is a problem. Example: `count == 4`, then no usable memory is allocated yet `if ((num[0>>4]` is called. — chux - Reinstate Monica, Jun 28 '16 at 19:03
@chux Good catch. Changed the input validation so `count` is at least 100. — dbush, Jun 28 '16 at 19:05
I small thing I just noticed: I think to check bit i `(num[i>>3] & (1<<(i&7)) != 0` should be used - notice the parentheses, since `!=` has a higher precedent than `&`. PS: Also in *Set bit i* you missed to close some parentheses. — Matthias, Jun 28 '16 at 19:09
@Matthias Yes. Those parenthesis are in the full code but not in the extracted example. Fixed. — dbush, Jun 28 '16 at 19:11
Hmm, concerning "can save additional space by only considering every other number, since all even number except 2 are non-prime." --> This is an easy to employ, but only halves the memory yet complicates `(num[i>>4] & (1<<((i>>1)&7)))!=0` . Could say that after 30, for every 30 `int` there are at most 8 primes and get about another memory halving. In both cases, not an important attribute to the "Sieve of Eratosthenes". — chux - Reinstate Monica, Jun 28 '16 at 19:34

score 2 · Answer 2 · answered Jun 27 '16 at 19:19

Largest native types (probably uint64_t) tend to perform the best. You can store your bitmasks in an array or generate them on-site by bitshifting. Contrary to intuition, the on-site generation might perform better due to better caching/memory access characteristics. In any case, it's a good idea to start code it in a fairly general fashion (e.g., define your types macros if you use pure C) and then test the different versions.

Caching (persistently or nonpersistently) some of your results might not be a bad idea either.

Thanks! I'll report back with my implementation in the next few days. — Matthias, Jun 27 '16 at 20:12

C - Sieve of Eratosthenes - BitField

2 Answers2