Thread static class methods vs global scope

Question

Imagine a functionality of an application that requires up to 5 threads crunching data, these threads use buffers, mutex and events to interact with each other. The performance is critical, and the language is C++.

The functionality can be implemented as one (compilation) unit with one class, and only one instance of this class can be instantiated for the application. The class itself implements 1 of the threads in run() method, which spawns other 4 threads, manages them and gathers them when user closes the application.

What is the advantage of choosing one of the following method over another (please do let me know of any better approach)?

Add 5 static methods to the class, each running a single thread, mutex and other data shared as static class variables.
Add 5 global functions (no scope) and use global variables, events and mutex (as if it is C)
change the pattern entirely, add 4 more classes each implementing one of the threads and share data via global variables.

Here are some thoughts and issues to be considered (please correct them if they are wrong):

Having threads as class members (static of course), they can rely on the singleton to access non-static member functions, it also gives them a namespace which by itself seems a good idea.
Using static class methods, the class header file soon will contain many static variables (and other helper static methods). Having to declare variables in the class header file may bring additional dependencies to other units that include the header file. If variables where declared globally they could be hidden in a separate header file.
Static class variables should be defined somewhere in the code, so it doubles typing declaration stuff.
Compilers can take advantage of the namespace resolution for more optimized code (as opposed to global variables possibly in different units).
The single unit can potentially be better optimized, whereas whole program optimization is slow and probably less fruitful.
If the unit grows I have to move some part of the code to a separate unit, so I will have one class with multiple (compilation) units, is this a anti-pattern or not?
If using more than one class, each handling one thread, again same question can be made to decide between static methods and global functions to implement the threads. In addition, this requires more lien of code, not a real issue but does it worth the additional overhead?

Please answer this assuming no library such as Qt, and then assuming that we can rely on QThread and implement one thread per run() method.

Edit1: The number of threads is fixed per design, number 5 is just an example. Please share your thoughts on the approaches/patterns and not on details.

Edit2: I have found this answer (to a different question) very helpful, I guess the first approach misuses classes as namespaces. Second approach can be mitigated if coupled with namespace.

I don't like plan 1, 2 or 3. They all seem to have the magic number '5', (or 5-1), in them. This begins to smell of thread micro-management and lack of flexibility. If performance is critical, it doesn't matter much how you cobble together the code - what matters is how you cobble together the data. What data are you operating on? Can it be simply divided up between threads to reduce locking and false sharing? Where does the input data come from and where does the output go? — Martin James, Feb 17 '12 at 00:38
+1 @MartinJames - Martin is right: you are thinking of the problem in the wrong way. You need to decide "how can I divide up my data in such a way that calculations on the piece are independent"? Thinking about input and output is also important: if you do it the right way, you might not even need a lock. I highly recommend that you build a simplified model and test it thoroughly to ensure that you understand how threading and locking works before implementing it for real. — kfmfe04, Feb 17 '12 at 01:44
@MartinJames - Thanks for the answer, actually in one project I am using approach 1. It has networking, video capture, encoding, decoding, tracking and rendering. I have divided the tasks assuming a 4 core processor, and it how magic number 5 comes. Meanwhile, I am using IPP, OpenMP, and OpenGL direct rendering, I can get below 3ms which means a good fps. I have ring buffers and locks are used only when one of the threads is behind, so it is almost lock-less. — dashesy, Feb 17 '12 at 06:26
@kfmfe04 - Application is working fine, only now I have a fat cpp file, a header file, and as I said including it brings some dependency to other units reducing inter-dependence of the classes. I have some other projects with the second approach, but to me they are C disguised as C++. I also have some Java projects with approaches similar to the third method. So I was wondering what is the rationale behind each of those patterns, and what other people use. — dashesy, Feb 17 '12 at 06:27

score 3 · Accepted Answer · edited Jun 20 '20 at 09:12

Sources

First, you should read the whole concurrency articles from Herb Sutter:

http://herbsutter.com/2010/09/24/effective-concurrency-know-when-to-use-an-active-object-instead-of-a-mutex/

This is the link to the last article's post, which contains the links to all the previous articles.

What's your case?

According to the following article: How Much Scalability Do You Have or Need? ( http://drdobbs.com/parallel/201202924 ), you are in the O(K): Fixed case. That is, you have a fixed set of tasks to be executed concurrently.

By the description of your app, you have 5 threads, each one doing a very different thing, so you must have your 5 threads, perhaps hoping one or some among those can still divide their tasks into multiple threads (and thus, using a thread pool), but this would be a bonus.

I let you read the article for more informations.

Design questions

About the singleton

Forget the singleton. This is a dumb, overused pattern.

If you really really want to limit the number of instances of your class (and seriously, haven't you something better to do than that?), You should separate the design in two: One class for the data, and one class to wrap the previous class into the singleton limitation.

About compilation units

Make your headers and sources easy to read. If you need to have the implementation of a class into multiple sources, then so be it. I name the source accordingly. For example, for a class MyClass, I would have:

MyClass.hpp : the header
MyClass.cpp : the main source (with constructors, etc.)
MyClass.Something.cpp : source handling with something
MyClass.SomethingElse.cpp : source handling with something else
etc.

About compiler optimisations

Recent compiler are able to inline code from different compilation units (I saw that option on Visual C++ 2008, IIRC). I don't know if whole global optimization works worse than "one unit" compilation, but even if it is, you can still divide your code into multiple sources, and then have one global source include everything. For example:

MyClassA.header.hpp
MyClassB.header.hpp
MyClassA.source.hpp
MyClassB.source.hpp
global.cpp

and then do your includes accordingly. But you should be sure this actually makes your performance better: Don't optimize unless you really need it and you profiled for it.

Your case, but better?

Your question and comments speak about monolithic design more than performance or threading issue, so I could be wrong, but what you need is simple refactoring.

I would use the 3rd method (one class per thread), because with classes comes private/public access, and thus, you can use that to protect the data owned by one thread only by making it private.

The following guidelines could help you:

1 - Each thread should be hidden in one non-static object

You can either use a private static method of that class, or an anonymously namespaced function for that (I would go for the function, but here, I want to access a private function of the class, so I will settle for the static method).

Usually, thread construction functions let you pass a pointer to a function with a void * context parameter, so use that to pass your this pointer to the main thread function:

Having one class per thread helps you isolate that thread, and thus, that thread's data from the outer world: No other thread will be able to access that data as it is private.

Here's some code:

// Some fictious thread API
typedef void (*MainThreadFunction)(void * p_context) ;
ThreadHandle CreateSomeThread(MainThreadFunction p_function, void * p_context) ;

// class header
class MyClass
{
   public :
      MyClass() ;
      // etc.

      void         run() ;

   private :
      ThreadHandle m_handle ;

      static void  threadMainStatic(void * p_context) ;
      void         threadMain() ;
}

.

// source
void MyClass::run()
{
   this->m_handle = CreateSomeThread(&MyClass::threadMainStatic, this) ;
}

void MyClass::threadMainStatic(void * p_context)
{
   static_cast<MyClass *>(p_context)->threadMain() ;
}

void MyClass::threadMain()
{
   // Do the work
}

Displaimer: This wasn't tested in a compiler. Take it as pseudo C++ code more than actual code. YMMV.

2 - Identify the data that is not shared.

This data can be hidden in the private section of the owning object, and if they are protected by synchronization, then this protection is overkill (as the data is NOT shared)

3 - Identify the data that is shared

... and verify its sychronization (locks, atomic access)

4 - Each class should have its own header and source

... and protect the access to its (shared) data with synchronization, if necessary

5 - Protect the access as much as possible

If one function is used by a class, and only a class, and does not really need access to the class internals, then it could be hidden in an anonymous namespace.

If one variable is owned by only a thread, hide it in the class as a private variable member.

etc.