1

I have been experimenting with std::async. I wrote a (terribly inefficient) function that sums up all prime numbers up to a given limit.

Without std::async the function always gives the expected outcome as judged by a ctest unit test. Using std::async but no std::lock_guard / mutex, there is (as expected) a big deviation between the expected value and the calculated one. Using std::lock_guard / mutex, the outcome is still irreproducible and gives the correct outcome in around 50% of the tests, in the other 50% the value is off by 1-2 (e.g. 9151 instead of 9152).

I am debating if there is a problem because another bottle neck is introduced by my isPrime(int) function. Alternatively, I am thinking that the program (sometimes) terminates prematurely before the last thread has done its work.

In either case, I don't understand why the std::lock_guard does not seem to protect the variable count.

// primeAsync.cc

#include <future>
#include <vector>

#include "primeAsync.h"
#include "prime.h"

// mutex for thread safety
static std::mutex s_PrimeMutex;

// handle to store futures
static std::vector<std::future<void>> s_Futures;

static void countPrimesHelper(int* count, const long long number);

// passes ctest every other time, count is off by 1-2 (e.g. 9151 instead of 9152)
int countPrimesAsync(const long long limit) {
    int count = 0;
    for(long i = 2; i < limit; ++i) {
        s_Futures.push_back(std::async(std::launch::async, countPrimesHelper, &count, i));
    }
    return count;
}

// helper function for countPrimesAsync, std::async
static void countPrimesHelper(int* count, const long long number) {
    if(isPrime(number)) {
        const std::lock_guard<std::mutex> lock(s_PrimeMutex);
        ++(*count);
    }
}
// prime.cc

#include <iostream>
#include <vector>

// always passes ctest
bool isPrime(const long long n) {
    int mod;
    for(long long m = n - 1; m > 1; --m) {
        mod = n % m;
        if(mod == 0) {
            return false;
        }
    }
    return true;
}

// always passes ctest
int countPrimes(const long long limit) {
    int count = 0;
    for(long i = 2; i < limit; ++i) {
        if(isPrime(i)) {
            count++;
        }
    }
    return count;
}
east1000
  • 1,240
  • 1
  • 10
  • 30
iridiumcc
  • 41
  • 6
  • And where are you waiting for all futures in vector to finish? – Selvin Sep 12 '21 at 23:16
  • Thread sanitizer [detects a data race here](https://gcc.godbolt.org/z/1vaYY8MzW). You might have better symbol luck locally. – chris Sep 12 '21 at 23:22
  • Your count variable is a local variable in a function that starts lots of threads and then exits. So the count pointer being incremented by all those threads points to a variable no longer exists. Your are lucky it worked at all. Waiting for all the threads to finish before the count variable stops existing is the right idea. – Jerry Jeremiah Sep 12 '21 at 23:56

2 Answers2

2

I changed the function to wait for all futures to finish before count is returned:

// now seems to pass ctest every time
int countPrimesAsync(const long long limit) {
    int count = 0;
    for(long i = 2; i < limit; ++i) {
        s_Futures.push_back(std::async(std::launch::async, countPrimesHelper, &count, i));
    }

    for(long i = 0; i < s_Futures.size(); ++i) {
        s_Futures[i].wait();
    }

    return count;
}

That solutions passes the test!

Alternatively, I found the following solution to check the status of a future:

Get the status of a std::future

Not sure which one is preferred (if any).

Thanks already though!

In line with François Andrieux, this solution also works and passes the ctest, does not require a static variable, and I think is more elegant than a second for loop:

// now seems to pass ctest every time
int countPrimesAsync(const long long limit) {
    // handle to store futures
    std::vector<std::future<void>> s_Futures;
    int count = 0;
    for(long i = 2; i < limit; ++i) {
        s_Futures.push_back(std::async(std::launch::async, countPrimesHelper, &count, i));
    }
    s_Futures.clear(); 
    return count;
}
iridiumcc
  • 41
  • 6
  • Note that a `std::future` returned by `std::async` has the property that its destructor will block until the associated `async` function has completed. So for example you can use `s_Futures.clear()`. I would also suggest you consider moving `s_Futures` to be a local variable inside `countPrimesAsync`. – François Andrieux Sep 13 '21 at 00:16
1

Another thing people haven't mentioned, is that atomic variables allow you to have a value that is accessed and written to by multiple threads without race conditions, and will do so much faster than a lock guard.

For example:

#include <atomic>
std::atomic<int> count;

static void countPrimesHelper(std::atomic<int>* count, const long long number) {
    if(isPrime(number)) {
        (*count) += 1;
    }
}

This is race-condition free, even with 1 million threads accessing count. It works by doing the increment as one operation that can't be accessed in an incomplete state. It also prevents your processor, if you have a recent one, from doing this in parallel. (Using a low-level lock instruction that lasts for one operation)

Here is more information on the atomic header: https://en.cppreference.com/w/cpp/atomic

Patrick Poitras
  • 367
  • 1
  • 11
  • 1
    @François Andrieux I agree, I quickly added the atomic solution in ```countPrimesHelper```. The function still fails the test, I believe, even though atomic may be faster than a lock guard, the main problem I was facing was that the futures were not ready when the program terminated. – iridiumcc Sep 13 '21 at 00:02
  • 1
    @iridiumcc Yes, this answer is a distinct improvement but does not solve the core problem. – François Andrieux Sep 13 '21 at 00:15