5

I have a function that is called millions of times, and the work done by this function is multithreaded. Here is the function:

void functionCalledSoManyTimes()
{
  for (int i = 0; i < NUM_OF_THREADS; i++)
  {
    pthread_create(&threads[i], &attr, thread_work_function, (void *)&thread_data[i]);
  }
  // wait
}

I'm creating the threads each time the function is called, and I give each thread its data struct (that's been set once at the beginning of the algorithm) to use in the thread_work_function. The thread_work_functionsimply processes a series of arrays, and the thread_data struct contains pointers to those arrays and the indices that each thread is responsible for.

Although multithreading the algorithm in this way did improve the performance by more than 20%, my profiling shows that the repetitive calls to pthread_create are causing a significant overhead.

My question is: Is there a way to achieve my goal without calling pthread_create each time the function is called?

Problem Solved.

Thank you guys, I really appreciate your help! I've written a solution here using your tips.

Community
  • 1
  • 1
mota
  • 5,275
  • 5
  • 34
  • 44
  • 2
    You need to look at a *thread pool* concept. Here you have a pool of threads, which wait for work. You schedule this work, say in a queue - which the threads a blocked on, and one thread will pick up the work and execute it. This is a fairly common concept, and if you have the possibility, pull in this functionality from something like the APR (Apache Portable Runtime.) – Nim Sep 04 '12 at 17:31

2 Answers2

3

Just start a fixed set of threads and use an inter-thread communication system (ring buffer, for instance) to pass the data to process.

ziu
  • 2,634
  • 2
  • 24
  • 39
2

Solving the problem gracefully is not so easy. You can use static storage for a thread pool, but then what happens if functionCalledSoManyTimes itself can be called from multiple threads? It's not a good design.

What I would do to handle this sort of situation is create a thread-local storage key with pthread_key_create on the first call (using pthread_once), and store your thread-pool there with pthread_setspecific the first time functionCalledSoManyTimes gets called in a given thread. You can provide a destructor function to pthread_key_create which will get called when the the thread exists, and this function can then be responsible for signaling the worker threads in the thread pool to terminate themselves (via pthread_cancel or some other mechanism).

R.. GitHub STOP HELPING ICE
  • 208,859
  • 35
  • 376
  • 711
  • 1
    Sometimes you can boot the problem upstairs - require the caller of `functionCalledSoManyTimes` to pass in a thread pool that's created/destroyed by functions you provide for them. This is an easy change if the caller's code looks like `for (int i = 0; i < SOMANY; ++i) functionCalledSoManyTimes();`, not easy if the function is called from many unrelated sites, and easy but useless if you want to hide the fact that it's multi-threaded from the caller. – Steve Jessop Sep 04 '12 at 18:34
  • That's also a nice solution, but I suspect it's often not applicable, especially if you want to keep the implementation of `functionCalledSoManyTimes` opaque/encapsulated. – R.. GitHub STOP HELPING ICE Sep 04 '12 at 22:31
  • Agreed. I mention it because I'd want to rule it out before introducing hidden shared state, whether that's static or thread-local. If I fail to rule it out, that's a bonus. Another possibility would be to tell the caller that what they're creating and passing in is a `ManyTimesFunctionOptimizer`. That secretly contains a thread pool, and your function implementation is encapsulated other than the fact that it can optionally re-use this object to get a speed boost. Whether that object contains a thread pool or a memoization cache (or both) is opaque to the caller. – Steve Jessop Sep 05 '12 at 08:04