Can I safelly share a variable across threads in C++ using only std::atomic without std::mutex?

Question

I made a program which calculates prime numbers on multiple cores. (Please ignore the fact, that the algorithm is not perfectly effective and numbers 0 and 1 are here considered to be primes. The purpose is only to practice using threads.)

The variable taken (a number to be tested next) is beeing shared across 8 threads.

The problem is that it can be incremented by one thread and right after that by another thread and read by them when its already incremented twice (or more times), so some values can be skipped, which is a bad thing.

I assumed it can be solved by using std::atomic_uint as the variable type, but I was apparently wrong.

Is there any way I could solve this problem without the need of using std::mutex since I heard it causes considerable overhead? source code:

#include <iostream>
#include <chrono>
#include <vector>
#include <algorithm>
#include <thread>
#include <atomic>

int main()
{
    const uint MAX = 1000;

    std::vector<bool> isPrime(MAX), done(MAX);
    std::fill(done.begin(), done.end(), false);
    std::atomic_uint taken{0}; //shared variable
    std::vector<std::thread> threads;
    auto start = std::chrono::system_clock::now();

    for (uint i = 0; i < 8; ++i) {
        threads.emplace_back(
            [&](){
                bool res;
                for (uint tested; (tested = taken.fetch_add(1)) < MAX; ) { //taken should be incremented and copied atomically
                    res = true;
                    for (uint k = 2; k < tested; ++k) {
                        if (tested % k == 0) {
                            res = false;
                            break;
                        }
                    }
                    isPrime[tested] = res;
                    done[tested] = true;
                }
            }
        );
    }
    for (auto & t : threads) {
        t.join();
    }

    auto end = std::chrono::system_clock::now();
    auto milliseconds = std::chrono::duration_cast<std::chrono::milliseconds>(end - start);
    uint num = std::count_if(isPrime.begin(), isPrime.end(), [](bool b){return b;});
    uint nDone = std::count_if(done.begin(), done.end(), [](bool b){return !b;});
    std::cout << "number: " << num << " duration: " << milliseconds.count() << '\n';
    std::cout << "not done: " << nDone << '\n';
    for (uint i = 0; i < MAX; ++i) { //Some numbers are always skipped
        if (!done[i]) {
            std::cout << i << ", ";
        }
    }
    std::cout << '\n';
    return 0;
}

The code was compiled using g++ with -O3 and -pthread arguments. output:

number: 169 duration: 1
not done: 23
143, 156, 204, 206, 207, 327, 328, 332, 334, 392, 393, 396, 502, 637, 639, 671, 714, 716, 849, 934, 935, 968, 969,

The output is different every time.

beware of `std::vector`, it will give race condition even if never overlapping indices are accessed ... — Massimiliano Janes, Jan 16 '18 at 15:45
For more information on std::vector https://stackoverflow.com/questions/33617421/write-concurrently-vectorbool — UKMonkey, Jan 16 '18 at 15:49
I agree with the comment above: The problem probably is with the shared `vector` variables. One solution would be for each thread to build its own independent list of pairs {number-tested, test-result}, and then have the main thread merge the lists after joining all of the worker threads. — Solomon Slow, Jan 16 '18 at 15:59
The loop seem right to me. Each time `taken` is incremented, its fetched value (`tested`) is processed. I guess the problem is in data race due to `std::vector`. Try to change it to `std::vector` and see the effect. — Daniel Langr, Jan 16 '18 at 15:59
@DanielLangr, Accessing a shared `vector` or a shared `vector` without synchronization may be less risky than accessing a shared `vector`, but it's still _undefined behavior_, — Solomon Slow, Jan 16 '18 at 16:05
@jameslarge I don't agree. You can access different memory locations from multiple threads without synchronization. Distinct elements of `vector` or `vector` are different memory locations. You just need to be sure not to update elements with same indices, but this in ensured in OPs code. — Daniel Langr, Jan 16 '18 at 16:06
Simply replace `std::vector isPrime(MAX), isDone(MAX);` by `std::array isPrime, isDone;`. You'll fix your data race plus have a nice time optimization for free. demo: http://coliru.stacked-crooked.com/a/36110afafdea0ee8 — YSC, Jan 16 '18 at 16:07
Possible duplicate of [Write concurrently vector](https://stackoverflow.com/questions/33617421/write-concurrently-vectorbool) — YSC, Jan 16 '18 at 16:08

score 6 · Accepted Answer · answered Jan 16 '18 at 16:04

6

The specialization std::vector<bool> compresses values to individual bits. Therefore, there are multiple vector elements in a single byte, i.e., in a single memory location. Consequently, your threads updates same memory locations without synchronization, which is a data race (and therefore undefined behavior according to Standard).

Try to change std::vector<bool> to std::vector<char>.

answered Jan 16 '18 at 16:04

Daniel Langr

22,196
3
50
93

Overwriting `std::vector` to `std::vector` solved it. Thank you very much. – Petr Král Jan 16 '18 at 16:15

Can I safelly share a variable across threads in C++ using only std::atomic without std::mutex?

1 Answers1