3

Goal: I'd like to distribute a couple of billions of points into bins. Bins should be flushed to disk to keep memory footprint sane.

My attempt: Whenever a bin reaches a threshold, e.g. 1 million points, I'd like to spawn a thread that writes the points to disk. Data is written to one file per bin; Multiple threads can be spawned for different bins, but at most one thread per bin. I'm doing this by checking a bool named "flushing". If a bin starts beeing flushed, it's set to true in the main thread, and back to false by the write thread.

Question: Will this cause threading issues? My assumption is that there shouldn't be an issue since "flushing" can only become true when the thread has already done it's job and a new thread is allowed to spawn. It's okay if the bins become larger than 1 million points in the meantime.

struct Bin{
    vector<Point> points;
    bool flushing = false;
}

vector<Bin> bins;

void add(Point point){

    int index = computeBinIndex(point);

    Bin& bin = bins[index];
    bin.points.push_back(point);

    // only start flushing if bin.flushing == false
    if(bin.points.size() > 1'000'000 && bin.flushing == false){
        flush(bin);
    }

}

void flush(Bin& bin){

    vector<Point> points = std::move(bin.points);
    bin.points = vector<Point>();
    bin.flushing = true;

    thread t([points, bin](){

        saveToDisk(points);

        // we're done, set bin.flushing back to false
        bin.flushing = false;
    });
    t.detach();

}
Markus
  • 2,174
  • 2
  • 22
  • 37
  • If bandwith to disk is an issue, then I would recommend not to write several files simultaneously. Having several threads is OK but I would synchronize them with a mutex arround the I/O section. Why do you require lock-free BTW ? – Meatboy 106 Jan 08 '20 at 12:50
  • @Meatboy106 Multiple write threads usually made things faster than a single write thread for me, but yes, I'll make sure that the number of write threads is capped to something between 4-16 threads. – Markus Jan 08 '20 at 12:53
  • I can reach more than 3 GiB/s output from a single thread writting to a ramdisk. It is probably more than what your disk is capable of. Serialization of your data might be the limiting factor. I would then recommend performing the serialization of your bins in parallel and storing the resulting data in ram before performing the output all at once but one thread at a time. – Meatboy 106 Jan 08 '20 at 14:11
  • According to https://www.tomshardware.com/reviews/ssd-gaming-performance,2991-3.html: "SSDs are different, though. They're built using multiple NAND flash channels attached to a controller, and maximizing the utilization of each channel requires high queue depths" So more threads writing data in parallel seems to be beneficial for SSDs, which I am using. I will give single threaded writing a try, but from past experiences, I've got way better utilization with more threads. – Markus Jan 08 '20 at 14:29
  • Also you should capture `bin` by reference in order to set the correct version of `bin.flushing` to `false`. Here you will flush only once. And capturing `points` by value incurs a useless copy. If you are in C++14 or more, you should use move capture. If in C++11, then move `points` in the thread using an argument of type `vector` in the lambda. – Meatboy 106 Jan 08 '20 at 14:32
  • Yeah, that part was just for the question. For the actual code I'm using, I use pointers to buffers without any copying happening. – Markus Jan 08 '20 at 14:33
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/205610/discussion-between-meatboy-106-and-markus). – Meatboy 106 Jan 08 '20 at 14:35

1 Answers1

3

Assuming add (and for that matter flush) is not called from multiple threads.

When the main goal is:

Multiple threads can be spawned for different bins, but at most one thread per bin.

Then yes, your current solution with a single boolean flag works. Instead of using a plain bool you must use std::atomic<bool> though to avoid issues though!

Daniel Jour
  • 15,896
  • 2
  • 36
  • 63
  • I wanted to avoid any synced/atomic primitive since the checks happen in a very very bussy loop, but I'll check if it does make a performance impact or not. – Markus Jan 08 '20 at 12:55
  • Keep in mind that the atomic check will only be called when a bin reaches over a million elements. It will only be called often, if the saving to disk takes significantly longer than getting a bin to over 1 million points (since only if the bin is still flushing when we have that number, the check will be called multiple times in a row) But then you already have the bigger problem that your saving to disk cannot keep up and hence you will accumulate more and more needed space in RAM for the bins since they cannot be written to disk fast enough. – n314159 Jan 08 '20 at 13:30
  • 1
    You **MUST** use atomics because instructions can be reordered by both the compiler and the CPU around the boolean test. You are entering the tricky world of lock-free programming. I recommand you to take some time to read about c++11 memory model and memory ordering not to end up with inexplicable bugs. – Meatboy 106 Jan 08 '20 at 13:31
  • Just tested and seems like it doesn't even pose a performance impact to use atomic bool, perhaps partially because of what @n314159 said. So I'm fine withing using the atomic. :) Thanks for the help and the link to the issue that outlines why it's necessary. – Markus Jan 08 '20 at 13:53