After you've identified 2 and 3 as prime, all remaining primes take the form 6N±1, for N >= 1. To balance the workload across K threads, you should have each of the K threads stepping through its own sequence of values for N: thread T working on 1 + T + K * X, where for each thread, X sequences from 0 upwards. If you have 8 threads, it means thread 0 works on N₀ = { 1, 9, 17, … }, etc. It still means thread K-1 does more work than thread 0 because it is tackling bigger numbers, but the discrepancy is much less than when you slice the ranges horizontally. This means you need to provide each thread with a starting number, S, and the total number of threads, K, and the thread will then set a counter x to values 0, 1, ... and check for primes 6(S + xK)±1.
Now, with that as a decent basis for creating a multi-threaded prime finder. This is closely based on your code. It does use some code that is available in my SOQ (Stack Overflow Questions) repository on GitHub as files timer.c
, timer.h
, stderr.c
and stderr.h
in the src/libsoq sub-directory. It uses function isprime()
renamed from IsPrime3B()
found in the file isprime.c
in the src/Primes sub-directory of my SOQ repository. This program is prime-thread.c
from the same src/Primes
directory.
/* SO 6438-1942 */
#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include "stderr.h"
#include "timer.h"
#define NANO_PER_SEC 1.0E9
enum { NUM_THREADS = 8 };
enum { MAX_NUMBER = 10000000 };
static size_t counts[NUM_THREADS];
static const unsigned int small_primes[] =
{
5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47,
53, 59, 61, 67, 71, 73, 79, 83, 87, 89, 91, 97
};
enum { NUM_SMALL_PRIMES = sizeof(small_primes) / sizeof(small_primes[0]) };
/* IsPrime3B() from isprime.c - renamed is_prime() */
static int is_prime(unsigned number)
{
if (number <= 1)
return 0;
if (number == 2 || number == 3)
return 1;
if (number % 2 == 0 || number % 3 == 0)
return 0;
for (unsigned i = 0; i < NUM_SMALL_PRIMES; i++)
{
if (number == small_primes[i])
return 1;
if (number % small_primes[i] == 0)
return 0;
}
/* After 97, the next prime numbers are 101, 103, 107, 109 */
/*
** It would be feasible to start this loop from:
** i = (((small_primes[NUM_SMALL_PRIMES - 1] + 1) / 6) + 1) * 6
*/
for (unsigned i = 102; (i - 1) <= number / (i - 1); i += 6)
{
if (number % (i - 1) == 0 || number % (i + 1) == 0)
return 0;
}
return 1;
}
typedef struct prime_finder_vars
{
unsigned from;
unsigned to;
unsigned increment; /* Number of threads */
unsigned idx;
} PrimeFinderVars;
static void *prime_finder(void *pf)
{
PrimeFinderVars *pf_vars = (PrimeFinderVars *) pf;
printf("Thread %u: from = %u, to = %u, inc = %u\n",
pf_vars->idx, pf_vars->from, pf_vars->to, pf_vars->increment);
unsigned next = pf_vars->from;
while (next < pf_vars->to) {
unsigned six_n = 6 * next;
if (is_prime(six_n - 1))
++counts[pf_vars->idx];
if (is_prime(six_n + 1))
++counts[pf_vars->idx];
next += pf_vars->increment;
}
printf("Thread %u: done\n", pf_vars->idx);
return pf;
}
int main(int argc, char **argv)
{
err_setarg0(argv[0]);
if (argc != 1)
err_usage("");
struct timespec start;
struct timespec end;
double start_sec, end_sec, elapsed_sec;
int sum = 0;
Clock clk;
clk_init(&clk);
clk_start(&clk);
clock_gettime(CLOCK_REALTIME, &start);
pthread_t threads[NUM_THREADS];
PrimeFinderVars vars[NUM_THREADS];
int max_n = (MAX_NUMBER + 5) / 6;
for (int i = 0; i < NUM_THREADS; i++)
{
vars[i].from = i + 1;
vars[i].to = max_n;
vars[i].idx = i;
vars[i].increment = NUM_THREADS;
int rc;
if ((rc = pthread_create(&threads[i], NULL, prime_finder, &vars[i])) != 0)
err_syserr("failed to create thread %d: ", i);
}
for (int i = 0; i < NUM_THREADS; i++)
{
pthread_join(threads[i], NULL);
sum += counts[i];
}
clock_gettime(CLOCK_REALTIME, &end);
clk_stop(&clk);
start_sec = start.tv_sec + start.tv_nsec / NANO_PER_SEC;
end_sec = end.tv_sec + end.tv_nsec / NANO_PER_SEC;
elapsed_sec = end_sec - start_sec;
printf("Time 1: %.6f\n", elapsed_sec);
char buffer[32];
printf("Time 2: %s\n", clk_elapsed_us(&clk, buffer, sizeof(buffer)));
/* Because 2 and 3 are primes but are not analyzed */
size_t t_count = 2;
for (int i = 0; i < NUM_THREADS; i++)
{
t_count += counts[i];
printf("%d: %7zu primes found\n", i, counts[i]);
}
printf("Total primes found up to %d = %zu\n", MAX_NUMBER, t_count);
return 0;
}
Example output:
$ timecmd -u -- prime-thread
2020-10-16 12:15:05.101785 [PID 75174] prime-thread
Thread 0: from = 1, to = 1666667, inc = 8
Thread 7: from = 8, to = 1666667, inc = 8
Thread 2: from = 3, to = 1666667, inc = 8
Thread 3: from = 4, to = 1666667, inc = 8
Thread 5: from = 6, to = 1666667, inc = 8
Thread 4: from = 5, to = 1666667, inc = 8
Thread 6: from = 7, to = 1666667, inc = 8
Thread 1: from = 2, to = 1666667, inc = 8
Thread 0: done
Thread 6: done
Thread 4: done
Thread 7: done
Thread 3: done
Thread 5: done
Thread 2: done
Thread 1: done
Time 1: 0.231135
Time 2: 0.231135
0: 83090 primes found
1: 83176 primes found
2: 83023 primes found
3: 82996 primes found
4: 83060 primes found
5: 82995 primes found
6: 83179 primes found
7: 83058 primes found
Total primes found up to 10000000 = 664579
2020-10-16 12:15:05.341489 [PID 75174; status 0x0000] - 0.239704s
$
There are indeed 664,579 primes less than 10,000,000.
Note that the timecmd
program counts the entire running time (start-up and printing) of the prime-thread
, whereas the internal timing only counts the thread creation, running, and termination time. That accounts for the 8 ms timing difference. (It's a home-brew program that I use for timing commands. It's loosely similar to the system-provided time
command — but significantly different too.)
Given a list of the primes up to 10,000,000, it would be feasible to calculate how many primes each thread should have found. Given that the totals are correct, it is unlikely that there's a problem there, though.
Timing
Note that the question says it took 21 seconds to count the number of primes up to 8,000,000. This code took 0.231 seconds to count the number of primes up to 10,000,000.
That suggests that the isprime()
function in use is not as good as the one I used.
Indeed, the code shown is:
int is_prime(long num) {
int limit = round(sqrt(num));
for (long i = 2; i <= limit; i++) {
if (num % i == 0)
return FALSE;
}
return TRUE;
}
This does far more work than necessary. There's only one even prime number, 2. This code checks 4, 6, 8, … which are trivially non-prime. That's twice as much work as necessary. Checking for 2 and then only checking odd numbers would be a significant improvement. Checking 2, 3, and then numbers which match 6N±1 gives another improvement.
Even so, checking one third as much data would only improve things by a factor of 3. It is likely that the unbalanced workload is a bigger factor. With 8 threads (0..7), thread 7 working on the range 7,000,000..8,000,000 and it has a lot more computation to do than thread 0 working on the range 0..1,000,000, even though there are fewer primes for it to count.
The question doesn't show a complete MCVE (Minimal, Complete, Verifiable Example
— or MRE or whatever name SO now uses)
or an
SSCCE (Short, Self-Contained, Correct Example). It doesn't show how many threads are in use.
I have not, but perhaps should, parameterize prime-thread.c
to take a variable number of threads and a variable range for analysis (and remove the thread debugging printing) and see how much it changes behaviour. On my machine, it is unlikely that more threads will improve things; it may be that fewer threads would be better.
I have a program primes
which prints the primes in a given range using a Sieve of Eratosthenes. It prints all 664,579 primes up to 10 million in about 0.135 seconds when the output goes to (SSD) file or /dev/null
. That is significantly faster than prime-thread
manages to count the primes. Quite a lot of the benefit there is from the better algorithm. This isprime()
function does a lot of computation for each candidate number.
Two lessons to draw from this:
- Algorithms matter.
- Threads aren't a panacea for speeding things up.