2

I would like to implement a multithreading process which is in charge of launching threads in parallel.

According to htop's output, each thread consumes less than 1% CPU but the main consumes around 100% CPU.

int main (int argc, char *argv[])
{
    struct sigaction action;
    int i;
    exitReq = 0;
    memset(&engine, 0, sizeof(stEngine_t));
    engine.NbTasks = 12;

    engine.TaskThread = malloc(engine.NbTasks * sizeof(stTask_t));

    /* NbTasks = 12 */
    for (i = 0; i < engine.NbTasks; i++) {
        engine.TaskThread[i] = array[i];
        engine.TaskThread[i].initTask();
        pthread_create(&engine.TaskThread[i].tId, NULL, my_handler, (void *) &engine.TaskThread[i]);
    }

    while (!exitReq) {
        //.. do stuff as reading external value (if value < limit => exitReq = 1)
        sched_yield();
    }

    for (i = 0; i < engine.NbTasks; i++) {
        (void)pthread_cancel(engine.TaskThread[i].tId);
        pthread_join(engine.TaskThread[i].tId, NULL);
        engine.TaskThread[i].stopTask();
        engine.TaskThread[i].tId = 0;
    }
    free(engine.TaskThread);
    memset(&engine, 0, sizeof(stEngine_t));          
    return 0;
}

static void* my_handler(void* params)
{
    stTask_t* ptask = (stTask_t*) params;

    pthread_setcancelstate(PTHREAD_CANCEL_ENABLE, NULL);

    while (!exitReq) {
        ptask->launchTask();
        pthread_testcancel();
    }
    pthread_exit(NULL);
}

The sched_yield man page says "sched_yield() causes the calling thread to relinquish the CPU.", that's why it has been used inside the loop.

I probably misunderstood something about the sched_yield() function, but is there a better and more reliable way to relinquish the CPU in this specific situation.

ogs
  • 1,139
  • 8
  • 19
  • 42
  • 1
    It's still a busy loop, why wouldn't it use all the CPU if nobody else wants it? – Sami Kuhmonen Oct 24 '16 at 07:24
  • Indeed, the loop works but its task should not consume all the CPU. That's what I want to understand and improve – ogs Oct 24 '16 at 07:46
  • 2
    OT: Concurrent access to `exitReq` should be protected, by a mutex for example. – alk Oct 24 '16 at 07:51
  • I would suggest posting a minimal version of the code for your question which can reproduce the problem. In this case, probably the `while` loop is all you need. – Luis de Arquer Oct 24 '16 at 08:02
  • 1
    @alk, not off-topic at all. Data races are not OK. https://www.securecoding.cert.org/confluence/display/c/CON43-C.+Do+not+allow+data+races+in+multithreaded+code – Michael Foukarakis Oct 24 '16 at 08:04
  • @alk thank for your remark. So, 'exitReq' should be protected by a mutex, even if it is updated only by the main loop (or by the signal handler) ? – ogs Oct 25 '16 at 15:58
  • Any variable needs be protected if concurrent read/write (or write-only) access is possible. Just concurrent read-only access doesn't need to be protected. – alk Oct 25 '16 at 17:25

2 Answers2

4

sched_yield() doesn't put your thread to sleep. It just pushes it back to the end of the threads queue to be run by the CPU. However, your thread is still a running thread, and will be put to run again as soon as there are no other threads before in the queue.

What happens is, if your thread is the only running one in the queue, it will be rescheduled immediately every time, using 100% of the CPU.

Probably you want to put your thread to sleep, either by calling sleep directly, or block-waiting for some event (e.g. poll())

Luis de Arquer
  • 377
  • 2
  • 8
  • but there are 12 remaining other running threads in the queue. – Domso Oct 24 '16 at 08:21
  • 1
    @Domso We can't actually see what they are doing (I can't see the code for whatever `launchTask()` points to) . Maybe they take up some CPU time, or maybe they don't (OP suggests 1%, so not much), what Luis is saying is that the Main loop/thread will hoover up any spare CPU. – code_fodder Oct 24 '16 at 08:30
  • @code_fodde i agree, but I think the other 12 Threads are actual not permanently running. We do not know it for sure, but if a function does not use any ressources while being executed inside of a loop, the function should contain some waiting, too. However, `launchTask()`should be povided – Domso Oct 24 '16 at 10:06
  • Indeed, the other 12 threads are not permanently running because each of them is in charge of executing a specific task periodically (~1-2 seconds). I modified the code by adding a sleep into the main loop (2 seconds) and the CPU consumption fell to 5% (max). The charge increases if I reduce the sleep duration. The task in the main loop needs to be executed very quickly, Luis, could you give more information about the poll() usage in this context ? – ogs Oct 25 '16 at 16:02
  • 1
    @SnP It depends on what the loop does, which we don't know. A typical case is having a loop that process some data as soon as it becomes available, so the solution is blocking until then (there is a [jillion](https://www.youtube.com/watch?v=whUvTIzcE_U) possibilities here, like `read()` for single file descriptor, `poll()/select()` for several file descriptors, `pthread_cond_wait()` for waiting to another thread as code_fodder said. – Luis de Arquer Oct 25 '16 at 20:00
  • @SnP If, however, you don't depend on any data being available, you probably can specify how fast your code should loop (like e.g. a render engine that needs 30 fps). So you can just use `sleep()/usleep()`, or for fine control on different machines you can implement throttling http://stackoverflow.com/questions/28008549/limit-while-loop-to-run-at-30-fps-using-a-delta-variable-c – Luis de Arquer Oct 25 '16 at 20:03
  • @LuisdeArquer Thank for your clarification ! The loop realizes few request over dbus protocol in order to retrieve external values, depending on these values the program must be properly stopped. I decided to create the reading thread into the main loop. After detaching it, by invoking `pthread_detach()`, I use `pthread_cond_wait()` into the while loop. Being validated by the launching thread, the condition is protected by a mutex. Is it sounds good ? Now, the process consumes ~2-3% of the CPU anyway :) – ogs Oct 26 '16 at 14:25
  • @SnP Should be fine. Just be careful with race conditions! – Luis de Arquer Oct 27 '16 at 13:10
2

I believe another take on this is that sched_yield(); is basically the same as Sleep(0);. Where:

A value of zero causes the thread to relinquish the remainder of its time slice to any other thread that is ready to run. If there are no other threads ready to run, the function returns immediately, and the thread continues execution.

(ok, yes this is from MSDN - but it explains it well enough)

Pretty much as Luis de Arquer mentions (+1 there). But this is not usually an optimal approach - i.e. its usually a "bit of a hack" to tell the OS to do its job as you want it to.

Not that it probably really matters much and I have done it plenty in the past for little apps... but you can try using pthread_cond_wait(); to do the waiting and pthread_cond_broadcast() or pthread_cond_signal() to wake up, with - or with so many threads, maybe a semaphore - where each thread signals when it is finished so the main can continue... you have a few options here.

code_fodder
  • 15,263
  • 17
  • 90
  • 167
  • 1
    Actually, I wonder if there is a proper use case for `sched_yield()` or `Sleep(0)` where they are really useful (in user space). – Luis de Arquer Oct 24 '16 at 13:48
  • lol...good question, ermmm, I actually use it quite a lot when I am being lazy, or just don't care about a slight performance hit (or more usually usleep(1)/sleep(1) - usually in minor projects or test apps. But, your question is interesting .... probably worth posting it up as a question! – code_fodder Oct 24 '16 at 14:01
  • Found [this](http://stackoverflow.com/a/24999481/7033248) on a java thread, so probably it would be flagged as duplicate. But I am not too sure that's a good approach at all... Multiple threads working on the very same bytes at large scale, isn't it terrible for cache? – Luis de Arquer Oct 24 '16 at 14:10
  • @LuisdeArquer Well, I read that article, seems reasonable, but as suggested it is very fine-grain optimization and optimization is usually a latter stage process and as you say it doesn't feel like the only solution to that - although his solution is probably the fastest! Not sure why the cache would be a problem?- its the same memory locations so should stay in memory? - maybe its useful for low-level graphics processing or such? – code_fodder Oct 24 '16 at 14:39
  • It may well be the fastest solution, yes, but I'm having doubts because every byte (cell) is processed by one thread, but read by two other threads (the threads processing the rows before and after), if I understood it well. I always thought memory visibility at multiple cores has a high cost in speed, as fast caches (L1 and possibly L2) are not shared between cores. See [this](http://stackoverflow.com/questions/4087280/approximate-cost-to-access-various-caches-and-main-memory). I'd probably try an approach where the matrix is divided in big blocks processed separately, – Luis de Arquer Oct 24 '16 at 16:07