1

I was trying to create this helper function in C# that returns the first n prime numbers. I decided to store the numbers in a dictionary in the <int,bool> format. The key is the number in question and the bool represents whether the int is a prime or not. There are a ton of resources out there calculating/generating the prime numbers(SO included), so I thought of joining the masses by crafting another trivial prime number generator.

My logic goes as follows:

public static Dictionary<int,bool> GetAllPrimes(int number)
    {
        Dictionary<int, bool> numberArray = new Dictionary<int, bool>();


        int current = 2;
        while (current <= number)
        {
            //If current has not been marked as prime in previous iterations,mark it as prime
            if (!numberArray.ContainsKey(current))
                numberArray.Add(current, true);

            int i = 2;
            while (current * i <= number)
            {
                if (!numberArray.ContainsKey(current * i))
                    numberArray.Add(current * i, false);
                else if (numberArray[current * i])//current*i cannot be a prime
                    numberArray[current * i] = false;
                i++;

            }
            current++;
        }
        return numberArray;
    }

It will be great if the wise provide me with suggestions,optimizations, with possible refactorings. I was also wondering if the inclusion of the Dictionary helps with the run-time of this snippet.

sc_ray
  • 7,803
  • 11
  • 63
  • 100
  • 2
    just one point - 1 isn't prime number. – nothrow Aug 06 '10 at 21:12
  • @Yossarian- point taken. Changed the snippet – sc_ray Aug 06 '10 at 21:14
  • 1
    Take a look on http://en.wikipedia.org/wiki/Sieve_of_Eratosthenes which is interesting for a initial prime generator - it has a fine Java implementation. – Lasse Espeholt Aug 06 '10 at 21:22
  • Here is my sieve in Java: http://stackoverflow.com/questions/1042902/most-elegant-way-to-generate-prime-numbers/1043247#1043247 . It computes primes up to 1 billion in about a minute using a BitSet to represent primes. – starblue Aug 07 '10 at 11:08

7 Answers7

2

The first thing that bothers is that, why are you storing the number itself ?

Can't you just use the index itself which will represent the number?

PS: I'm not a c# developer so maybe it is not possible with a dictionary, but it can be done with the appropriate structure.

Soufiane Hassou
  • 17,257
  • 2
  • 39
  • 75
  • Do you have an example(in the language of your choice) where the numbers are being indexed? – sc_ray Aug 06 '10 at 21:18
  • 1
    I believe he means you can replace the numbers *with* the indexes - an array of Booleans would suffice in this case (`arr[0] = false, arr[1] = false, arr[2] = true, arr[3] = true, arr[4] = false, etc...`) – Dan J Aug 06 '10 at 21:22
  • That means allocating a data structure with a predefined size. A array of Booleans may impose pre-allocating and initializing the array before using it. – sc_ray Aug 06 '10 at 21:24
  • 1
    What about `List`? It can use indexes like an array (in fact, I'm pretty sure it uses an array internally), but automatically takes care of resizing and allocating more space, if needed. – Heinzi Aug 06 '10 at 21:32
2

First, you only have to loop untill the square root of the number. Make all numbers false by default and have a simple flag that you set true at the beginning of every iteration.

Further, don't store it in a dictionary. Make it a bool array and have the index be the number you're looking for. Only 0 won't make any sense, but that doesn't matter. You don't have to init either; bools are false by default. Just declare an bool[] of number length.

Then, I would init like this:

primes[2] = true;
for(int i = 3; i < sqrtNumber; i += 2) {

}

So you skip all the even numbers automatically.

By the way, never declare a variable (i) in a loop, it makes it slower.

So that's about it. For more info see this page.

Jouke van der Maas
  • 4,117
  • 2
  • 28
  • 35
2

Storing integers explicitly needs at least 32 bits per prime number, with some overhead for the container structure.

At around 231, the maximal value a signed 32 bit integer can take, about every 21.5th number is prime. Smaller primes are more dense, about 1 in ln(n) numbers is prime around n.

This means it is more memory efficient to use an array of bits than to store numbers explicitly. It will also be much faster to look up if a number is prime, and reasonably fast to iterate through the primes.

It seems this is called a BitArray in C# (in Java it is BitSet).

starblue
  • 55,348
  • 14
  • 97
  • 151
  • I really like the idea of using the BitArray and I also checked out your implementation using Java. I understand the merits of indexing, but the index itself is an integer. Can you shed some light on how indexing works in data structures such as BitArray that lowers the overhead but at the same time promotes faster lookup? – sc_ray Aug 07 '10 at 18:32
1

I'm pretty sure the Dictionary actually hurts performance, since it doesn't enable you to perform the trial divisions in an optimal order. Traditionally, you would store the known primes so that they could be iterated from smallest to largest, since smaller primes are factors of more composite numbers than larger primes. Additionally, you never need to try division with any prime larger than the square root of the candidate prime.

Many other optimizations are possible (as you yourself point out, this problem has been studied to death) but those are the ones that I can see off the top of my head.

Daniel Pryden
  • 59,486
  • 16
  • 97
  • 135
  • Thanks. If order is not of importance for some weird reason, how else does the Dictionary hurt performance? – sc_ray Aug 06 '10 at 21:39
1

1) From the perspective of the client to this function, wouldn't it be better if the return type was bool[] (from 0 to number perhaps)? Internally, you have three states (KnownPrime, KnownComposite, Unknown), which could be represented by an enumeration. Storing an an array of this enumeration internally, prepopulated with Unknown, will be faster than a dictionary.

2) If you stick with the dictionary, the part of the sieve that marks multiples of the current number as composite could be replaced with a numberArray.TryGetValue() pattern rather than multiple checks for ContainsKey and subsequent retrieval of the value by key.

Ani
  • 111,048
  • 26
  • 262
  • 307
  • numberArray.TryGetValue() pattern sounds like a good idea. Do you have any examples of it? – sc_ray Aug 06 '10 at 21:32
  • Actually, in this case, it will be faster to simply do: numberArray[current * i] = false; which does not throw even if the key already exists. FYI, the pattern is: bool isMarkedPrime; if(numberArray.TryGetValue(current * i, out isMarkedPrime) || isMarkedPrime) numberArray[current * i] = false; – Ani Aug 06 '10 at 21:39
  • That should have been if(!numberArray.TryGetValue(current * i, out isMarkedPrime) || isMarkedPrime) numberArray[current * i] = false; – Ani Aug 06 '10 at 21:45
1

The dictionary really doesn't make sense here -- just store all primes up to a given number in a list. Then follow these steps:

Is given number in the list?  
    Yes - it's prime.  Done.
Not in list
Is given number larger than the list maximum?
    No - it's not prime.  Done.
Bigger than maximum; need to fill list up to maximum.
Run a sieve up to given number.
Repeat.
Austin Salonen
  • 49,173
  • 15
  • 109
  • 139
0

The trouble with returning an object that holds the primes is that unless you're careful to make it immutable, client code is free to mess up the values, in turn meaning you're not able to cache the primes you've already calculated.

How about having a method such as:

bool IsPrime(int primeTest);

in your helper class that can hide the primes it's already calculated, meaning you don't have to re-calculate them every time.

Nathan Adams
  • 1,255
  • 1
  • 11
  • 17