Shared file logging between threads in C++ 11

Question

Recently I started learning C++ 11. I only studied C/C++ for a brief period of time when I was in college.I come from another ecosystem (web development) so as you can imagine I'm relatively new into C++.

At the moment I'm studying threads and how could accomplish logging from multiple threads with a single writer (file handle). So I wrote the following code based on tutorials and reading various articles.

My First question and request would be to point out any bad practices / mistakes that I have overlooked (although the code works with VC 2015).
Secondly and this is what is my main concern is that I'm not closing the file handle, and I'm not sure If that causes any issues. If it does when and how would be the most appropriate way to close it?
Lastly and correct me if I'm wrong I don't want to "pause" a thread while another thread is writing. I'm writing line by line each time. Is there any case that the output messes up at some point?

Thank you very much for your time, bellow is the source (currently for learning purposes everything is inside main.cpp).

#include <iostream>
#include <fstream>
#include <thread>
#include <string>

static const int THREADS_NUM = 8;

class Logger
{

  public:
    Logger(const std::string &path) : filePath(path)
    {
        this->logFile.open(this->filePath);
    }

    void write(const std::string &data)
    {
        this->logFile << data;
    }

  private:
    std::ofstream logFile;
    std::string filePath;

};

void spawnThread(int tid, std::shared_ptr<Logger> &logger)
{

    std::cout << "Thread " + std::to_string(tid) + " started" << std::endl;

    logger->write("Thread " + std::to_string(tid) + " was here!\n");

};

int main()
{

    std::cout << "Master started" << std::endl;
    std::thread threadPool[THREADS_NUM];

    auto logger = std::make_shared<Logger>("test.log");

    for (int i = 0; i < THREADS_NUM; ++i)
    {
        threadPool[i] = std::thread(spawnThread, i, logger);
        threadPool[i].join();
    }

    return 0;
}

PS1: In this scenario there will always be only 1 file handle open for threads to log data.

PS2: The file handle ideally should close right before the program exits... Should it be done in Logger destructor?

UPDATE

The current output with 1000 threads is the following:

Thread 0 was here!
Thread 1 was here!
Thread 2 was here!
Thread 3 was here!
.
.
.
.
Thread 995 was here!
Thread 996 was here!
Thread 997 was here!
Thread 998 was here!
Thread 999 was here!

I don't see any garbage so far...

As many people (including me) have pointed out, your logs are run sequentially and not in parallel because you join right after you spawn each thread. — aerkenemesis, Nov 23 '17 at 12:40

freakish · Accepted Answer · 2017-11-23T12:22:27.510

1

My First question and request would be to point out any bad practices / mistakes that I have overlooked (although the code works with VC 2015).

Subjective, but the code looks fine to me. Although you are not synchronizing threads (some std::mutex in logger would do the trick).

Also note that this:

std::thread threadPool[THREADS_NUM];

auto logger = std::make_shared<Logger>("test.log");

for (int i = 0; i < THREADS_NUM; ++i)
{
    threadPool[i] = std::thread(spawnThread, i, logger);
    threadPool[i].join();
}

is pointless. You create a thread, join it and then create a new one. I think this is what you are looking for:

std::vector<std::thread> threadPool;

auto logger = std::make_shared<Logger>("test.log");

// create all threads
for (int i = 0; i < THREADS_NUM; ++i)
    threadPool.emplace_back(spawnThread, i, logger);
// after all are created join them
for (auto& th: threadPool)
    th.join();

Now you create all threads and then wait for all of them. Not one by one.

Secondly and this is what is my main concern is that I'm not closing the file handle, and I'm not sure If that causes any issues. If it does when and how would be the most appropriate way to close it?

And when do you want to close it? After each write? That would be a redundant OS work with no real benefit. The file is supposed to be open through entire program's lifetime. Therefore there is no reason to close it manually at all. With graceful exit std::ofstream will call its destructor that closes the file. On non-graceful exit the os will close all remaining handles anyway.

Flushing a file's buffer (possibly after each write?) would be helpful though.

Lastly and correct me if I'm wrong I don't want to "pause" a thread while another thread is writing. I'm writing line by line each time. Is there any case that the output messes up at some point?

Yes, of course. You are not synchronizing writes to the file, the output might be garbage. You can actually easily check it yourself: spawn 10000 threads and run the code. It's very likely you will get a corrupted file.

There are many different synchronization mechanisms. But all of them are either lock-free or lock-based (or possibly a mix). Anyway a simple std::mutex (basic lock-based synchronization) in the logger class should be fine.

edited Nov 23 '17 at 12:22

answered Nov 23 '17 at 12:00

freakish

54,167
9
132
169

Thank you for your reply, I edited my question. the file should remain open through entire program's lifetime. – 0x_Anakin Nov 23 '17 at 12:05
@Syd You don't have to do anything special. `std::ofstream` destructor closes the handle for you. It will fire when there are no more references to `logger`, i.e. at the program exit. – freakish Nov 23 '17 at 12:08
Ok, also I tried with 1000 threads but seems fine. – 0x_Anakin Nov 23 '17 at 12:09
@Syd That's because your original code **is not** multithreaded. See my update. – freakish Nov 23 '17 at 12:10
Oh I didn't realize I was doing that regarding threads! You are right, now each log line seems random (meaning the order it was appended into the file) – 0x_Anakin Nov 23 '17 at 12:17
@Syd The order is one thing - you can't really predict it and most of the time it doesn't really matter. But more dangerous is that the file might be corrupted, meaning entries may overlap, e.g. `Thread 1Thread 2 was here! was here!` is certainly a possibility. This is the real issue, you don't want that. This is the real reason you need a mutex (or something). – freakish Nov 23 '17 at 12:24
A mutex is a point of serialization and would pause threads, hence doesn't look ideal to me (in the general case). I'd have some queues for the log lines of the different threads and a thread in the looper that takes from the queues and writes on disk. – AndrewBloom Jan 20 '20 at 14:11
@AndrewBloom it is impossible to synchronize threads without "pauses", even lock-free solutions have them (often hidden as a busy loop, but its still a pause, in a sense that the thread doesn't do anything meaningful). You can have queues but then you have to synchronize access to them. You can use some lock-free solution, but (1) it is complex, (2) it still pauses threads (although possibly for a shorter period of time) and (3) it's a premature optimization: unless you've actually measured that logging is a performance bottleneck. – freakish Jan 20 '20 at 14:45
@freakish, as you said you would need a lock free solution. (1) in case of one producer - one consumer, it's not so complex. check for example https://github.com/google/oboe/blob/master/samples/RhythmGame/src/main/cpp/utils/LockFreeQueue.h . Boost has his own implementations too. (2) It still pauses threads. It does not. These techniques in fact are used in audio programming, where you need a predictable near real time behaviour. (3) it's premature optimisation. It's not. It's bad vs good design. – AndrewBloom Jan 21 '20 at 11:58
@AndrewBloom (1) one producer one consumer queue is insanely useless and not the case here (2) it is pausing in a sense that threads do meaningless work. Even atomic primitives wait for each other, its just that the pause is on cpu level, not os. (3) it is premature optimization. It has absolutely nothing to do with any design, this is an implementation detail that can be replaced at any point if needed. Exactly: if needed. Unlikely. – freakish Jan 21 '20 at 12:58
@freakish, 1) "insanely useless" based on what? you can substitute a multi-producer one-consumer queue with N one-producer one-consumer queue, it's a common strategy. 2) I suggest you to research the subject lock free programming and wait free programming before doing such assertions. What exactly would pause on cpu level? "threads doing meaningless work"? threads don't stop, so they are free to do whatever (if the programmer is meaningless they may do meaningless work though :) ) – AndrewBloom Jan 21 '20 at 13:51
3) It's bad design, because you use the worker threads to write on the disk. disk can be used by other processes, so it's common to use a new thread for IO operations. It's bad design cause you block unpredictably the worker threads, without any good reason to do that. Imagine you need to debug a function and it becomes randomly slower cause another thread is flooding the logger. Good luck with that. It's bad design cause you serialise heavily the threads, and that's against the fact you usually have threads to parallelise the work. – AndrewBloom Jan 21 '20 at 13:52
Also using blocking queues, having a queue for each thread (imagine for example a map of queues based on thread_id) a worker thread would be blocked only by the logger thread and not all the other 999. Obviously using a thread_pool and not 1000 threads would give a very small number of queues. (1000 threads --> probably bad design!) – AndrewBloom Jan 21 '20 at 13:52
@AndrewBloom using N queues is not a solution. How will you consume that? Busy loop though all, all the time? That's inefficient and no, it is not even close to "common". Cheating wont work. As for cpu pauses I suggest you read about bus locking. Plus busy loop (often seen in lock-free algorithms) is far from "I can do something else". That's not how those algorithms works. Always measure, never preoptimize. – freakish Jan 22 '20 at 08:09
Bad design is wasting time on complicated algorithms without measuring that you actually need them. Btw even with lock-free you need locks to pause consumer when queue is empty. – freakish Jan 22 '20 at 08:16
@freakish, let me clarify: I don't criticise your answer to this question, being this question more a coding exercise of a newbie programmer. I just pointed out that a logger in a real world scenario is done in a completely different way. Now, responding to your points: You consume N queues with a for loop. in pseudo-code: for each q in queues: copy content to one array: order elements by timestamp, then write on file. – AndrewBloom Jan 23 '20 at 17:54
" Busy loop though all,..." in a pure lock-free implementation, you can use std::this_thread::yield or this_thread::sleep_for, those usually transfer control to operating system that switches threads. If you want to use an atomic, you can wrap the push of a lockfree queue and set an atomic_bool to true and notify. atomic_bool will wake up the logger thread waiting with a condition_variable. I looked at bus locking, people say it's not an issue since pentium pro processor. Let me add that considering hardware specifics, is definitely a premature optimisation. – AndrewBloom Jan 23 '20 at 17:55
You're not in control of those elements and they can change without notice (everything has exception, if you develop for embedded systems you may want to consider hd constraints for example). I agree with you on the general quote "Always measure, never preoptimize" though not in this case.Let me end reporting a comment from the Android OS, in their logger source files that you can see here: – AndrewBloom Jan 23 '20 at 17:55
https://chromium.googlesource.com/aosp/platform/system/core/+/master/liblog/logger_write.c at lines 147-153 "* Threads that are actively writing at this point are not held back * by a lock and are at risk of dropping the messages with a return code * -EBADF. Prefer to return error code than add the overhead of a lock to * each log writing call to guarantee delivery. In addition, anyone * calling this is doing so to release the logging resources and shut down, * for them to do so with outstanding log requests in other threads is a * disengenuous use of this function." – AndrewBloom Jan 23 '20 at 17:56
Your "pure lock-free implementation" is either eating an entire cpu core (busy loop with or without thread yield, doesn't matter) or has lags (sleep_for for how long?). Condition variables require mutexes (locks). You try to cheat again. Plus now you have a complicated solution and still have locks. Finally: lock-free programming *is* all about hardware. Yes, when you do this you have to think about things like bus locking. Yes, it is hard, error prone and hardly ever it is worth it. I think I'm done with discussing this any further. – freakish Jan 23 '20 at 20:32
Oh, one more thing you may want to read: https://stackoverflow.com/questions/43540943/using-boost-lockfree-queue-is-slower-than-using-mutexes for naive thinking that lock-free is better. And no, you don't typically see lock-free algorithms "in real world", only in some rare cases. – freakish Jan 23 '20 at 21:02
@freakish, let me rephrase that once more so you can understand. You're suggesting to use a mutex to guard a disk operation. Critical sections guarded by mutexes must be as short, as fast and as predictable as possible. This is a golden rule to use mutexes. You're suggesting to guard an operation that is short in code but very slow and completely unpredictable. This is a gigantic mistake from design point of view, and I'm appalled you can't recognise that. You seem to not understand the different time scales of critical sections and atomic operations. – AndrewBloom Jan 24 '20 at 11:59
Your guarded section can take seconds, minutes or more even days based on the activity of the disk, an atomic operation takes nanoseconds. It's like comparing the diameter of an onion and the diameter of solar system. You seem to ignore the difference between a busy loop, like `for(i=0; i<1000000; i++);` and `this_thread::yield` the first one wastes a core with useless counting, the latter gives control to the os scheduler. the thread is rescheduled, that means another thread will be executed on that core. Plus, you should read better the link you posted, especially Zulan comments. – AndrewBloom Jan 24 '20 at 12:05

score 0 · Answer 2 · edited Nov 10 '19 at 10:13

Hello and welcome to the community!

A few comments on the code, and a few general tips on top of that.

Don't use native arrays if you do not absolutely have to.
Eliminating the native std::thread[] array and replacing it with an std::array would allow you to do a range based for loop which is the preferred way of iterating over things in C++. An std::vector would also work since you have to generate the thredas (which you can do with std::generate in combination with std::back_inserter)
Don't use smart pointers if you do not have specific memory management requirements, in this case a reference to a stack allocated logger would be fine (the logger would probably live for the duration of the program anyway, hence no need for explicit memory management). In C++ you try to use the stack as much as possible, dynamic memory allocation is slow in many ways and shared pointers introduce overhead (unique pointers are zero cost abstractions).
The join in the for loop is probably not what you want, it will wait for the previously spawned thread and spawn another one after it is finished. If you want parallelism you need another for loop for the joins, but the preferred way would be to use std::for_each(begin(pool), end(pool), [](auto& thread) { thread.join(); }) or something similar.
Use the C++ Core Guidelines and a recent C++ standard (C++17 is the current), C++11 is old and you probably want to learn the modern stuff instead of learning how to write legacy code. http://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines
C++ is not java, use the stack as much as possible - this is one of the biggest advantages to using C++. Make sure you understand how the stack, constructors and destructors work by heart.

You can use ranged for with native arrays. – krzaq Nov 23 '17 at 12:27 — krzaq, Nov 23 '17 at 12:27

score 0 · Answer 3 · answered Nov 23 '17 at 14:59

The first massive mistake is saying "it works with MSVC, I see no garbage", even moreso as it only works because your test code is broken (well it's not broken, but it's not concurrent, so of course it works fine).

But even if the code was concurrent, saying "I don't see anything wrong" is a terrible mistake. Multithreaded code is never correct unless you see something wrong, it is incorrect unless proven correct.

The goal of not blocking ("pausing") one thread while another is writing is unachieveable if you want correctness, at least if they concurrently write to the same descriptor. You must synchronize properly (call it any way you like, and use any method you like), or the behavior will be incorrect. Or worse, it will look correct for as long as you look at it, and it will behave wrong six months later when your most important customer uses it for a multi-million dollar project.

Under some operating systems, you can "cheat" and get away without synchronization as these offer syscalls that have atomicity guarantees (e.g. writev). That is however not what you may think, it is indeed heavyweight synchronization, only just you don't see it.

A better (more efficient) strategy than to use a mutex or use atomic writes might be to have a single consumer thread which writes to disk, and to push log tasks onto a concurrent queue from how many producer threads you like. This has minimum latency for threads that you don't want to block, and blocking where you don't care. Plus, you can coalesce several small writes into one.

Closing or not closing a file seems like a non-issue. After all, when the program exits, files are closed anyway. Well yes, except, there are three layers of caching (four actually if you count the physical disk's caches), two of them within your application and one within the operating system.

When data has made it at least into the OS buffers, all is good unless power fails unexpectedly. Not so for the other two levels of cache!
If your process dies unexpectedly, its memory will be released, which includes anything cached within iostream and anything cached within the CRT. So if you need any amount of reliability, you will either have to flush regularly (which is expensive), or use a different strategy. File mappying may be such a strategy because whatever you copy into the mapping is automatically (by definition) within the operating system's buffers, and unless power fails or the computer explodes, it will be written to disk.

That being said, there exist dozens of free and readily available logging libraries (such as e.g. spdlog) which do the job very well. There's really not much of a reason to reinvent this particular wheel.

score -1 · Answer 4 · answered Nov 23 '17 at 11:49

-1

The first question is subjective so someone else would want to give an advice, but I don't see anything awful.

Nothing in C++ standard library is thread-safe except for some rare cases. A good answer on using ofstream in a multithreaded environment is given here.

Not closing a file is indeed an issue. You have to get familiar with RAII as it is one of the first things to learn. The answer by Detonar is a good piece of advice.

answered Nov 23 '17 at 11:49

Volodymyr Lashko

656
6
19

Why would not closing a file be an issue? OS will close all remaining file handles on exit anyway. Also since RAII won't protect you from things like SIGKILL I don't see a reason to implement it at all. – freakish Nov 23 '17 at 11:52
1

`std::ofstream` closes the file on destruction which happens to `Logger`'s member automatically on `Logger`'s destruction or what do you mean? – bipll Nov 23 '17 at 12:00
Sure, `ofstream`'s destructor will call `close()`. My point is that it would be more beneficial to know RAII than just find a solution for using `ofstream` in particular. – Volodymyr Lashko Nov 23 '17 at 12:30

Shared file logging between threads in C++ 11

4 Answers4