6

I am getting started with pthreads in C and I am also a maniac of writing my code as "bug-free" as I possibly can.

Despite trying to be extra careful, valgrind is telling me that I am leaking memory, regardless weather:

  1. I create joinable threads that I join upon completion (code snippet 1)
  2. I create joinable threads that I detach after creation (code snippet 2)
  3. I create detached threads (code snippet 3)

I know this has already been discussed (see this, this and also this), but I am still curious as to:

  1. Why on certain runs I end up with no errors?
  2. Why there seems to be a random number of overall mallocs() when dealing with detached threads? << answer provided by nos, code snippet "fixed" with an added delay in the main()
  3. Why do the "memory-leak" persist even when dealing with detached threads? << same as 2.

As I understand from the previous answers and the valgrind trace, pthread_create() is the root cause, extending the stack used by threads as necessary and reusing it at times, thus a few missing frees. But what is less clear is why it depends on the execution run and why it also happens when creating detached threads. As I have seen from certain answers, comments and also from the man, the resources from a detached thread will be freed upon thread completion. I have tried various tweaks to work around this (added a sleep time before the end of each thread, before the end of the main thread, increased the stack size, add more "work"...) but it didn't change the end result by much. Also, why is there a random number of overall "mallocs()" when dealing with detached threads, does valgrind lose track of some of the detached threads? This also does not seem to depend on the stack size.

The provided code is a mock example of a manager/workers model for which a joinable/join() approach to thread management seems more suitable imho.

Thanks for any enlightenment you might be able to provide! I also hope these (over-commented) snippets of code will be helpful anyone wishing to get started with pthreads.

- swappy

PS Sys info: gcc on debian 64bit arch

Code snippet 1 (joinable threads joined):

/* Running this multiple times with valgrind, I sometimes end with :
    - no errors (proper malloc/free balance) 
    - 4 extra malloc vs free (most frequently) 
   The number of mallocs() is more conservative and depends on the number of threads. 
*/

#include <stdlib.h>             /* EXIT_FAILURE, EXIT_SUCCESS macros & the likes */
#include <stdio.h>              /* printf() & the likes */
#include <pthread.h>            /* test subject */

#define MAX_THREADS 100         /* Number of threads */
pthread_attr_t tattr;           /* Thread attribute */
pthread_t workers[MAX_THREADS]; /* All the threads spawned by the main() thread */

/* A mock container structure to pass arguments around */
struct args_for_job_t {
    int tid;
    int status;
};

/* The job each worker will perform upon creation */
void *job(void *arg)
{
    /* Cast arguments in a proper container */
    struct args_for_job_t *container;
    container = (struct args_for_job_t *)arg;

    /* A mock job */
    printf("[TID - %d]\n", container->tid);

    /* Properly exit with status code tid */
    pthread_exit((void *)(&container->status));
}

int main ()
{
    int return_code;                            /* Will hold return codes */
    void *return_status;                        /* Will hold return status */
    int tid;                                    /* Thread id */
    struct args_for_job_t args[MAX_THREADS];    /* For thread safeness */

    /* Initialize and set thread joinable attribute */
    pthread_attr_init(&tattr);
    pthread_attr_setdetachstate(&tattr, PTHREAD_CREATE_JOINABLE);

    /* Spawn detached threads */
    for (tid = 0; tid < MAX_THREADS; tid++)
    {
        args[tid].tid = tid;
        args[tid].status = tid;
        return_code = pthread_create(&workers[tid], &tattr, job, (void *)(&args[tid]));
        if (return_code != 0) { printf("[ERROR] Thread creation failed\n"); return EXIT_FAILURE; }
    }

    /* Free thread attribute */
    pthread_attr_destroy(&tattr);

    /* Properly join() all workers before completion */
    for(tid = 0; tid < MAX_THREADS; tid++)
    {
        return_code = pthread_join(workers[tid], &return_status);
        if (return_code != 0)
        {
            printf("[ERROR] Return code from pthread_join() is %d\n", return_code);
            return EXIT_FAILURE;
        }
        printf("Thread %d joined with return status %d\n", tid, *(int *)return_status);
    }

    return EXIT_SUCCESS;
}

Code snippet 2 (detached threads after creation):

/* Running this multiple times with valgrind, I sometimes end with :
    - no errors (proper malloc/free balance) 
    - 1 extra malloc vs free (most frequently) 
   Most surprisingly, it seems there is a random amount of overall mallocs 
*/

#include <stdlib.h>             /* EXIT_FAILURE, EXIT_SUCCESS macros & the likes */
#include <stdio.h>              /* printf() & the likes */
#include <pthread.h>            /* test subject */
#include <unistd.h>         

#define MAX_THREADS 100         /* Number of threads */
pthread_attr_t tattr;           /* Thread attribute */
pthread_t workers[MAX_THREADS]; /* All the threads spawned by the main() thread */

/* A mock container structure to pass arguments around */
struct args_for_job_t {
    int tid;
};

/* The job each worker will perform upon creation */
void *job(void *arg)
{
    /* Cast arguments in a proper container */
    struct args_for_job_t *container;
    container = (struct args_for_job_t *)arg;

    /* A mock job */
    printf("[TID - %d]\n", container->tid);

    /* For the sake of returning something, not necessary */
    return NULL;
}

int main ()
{
    int return_code;                            /* Will hold return codes */
    int tid;                                    /* Thread id */
    struct args_for_job_t args[MAX_THREADS];    /* For thread safeness */

    /* Initialize and set thread joinable attribute */
    pthread_attr_init(&tattr);
    pthread_attr_setdetachstate(&tattr, PTHREAD_CREATE_JOINABLE);

    /* Spawn detached threads */
    for (tid = 0; tid < MAX_THREADS; tid++)
    {
        args[tid].tid = tid;
        return_code = pthread_create(&workers[tid], &tattr, job, (void *)(&args[tid]));
        if (return_code != 0) { printf("[ERROR] Thread creation failed\n"); return EXIT_FAILURE; }
        /* Detach worker after creation */
        pthread_detach(workers[tid]);
    }

    /* Free thread attribute */
    pthread_attr_destroy(&tattr);

    /* Delay main() completion until all detached threads finish their jobs. */
    usleep(100000);
    return EXIT_SUCCESS;
}

Code snippet 3 (detached threads upon creation):

/* Running this multiple times with valgrind, I sometimes end with :
    - no errors (proper malloc/free balance) 
    - 1 extra malloc vs free (most frequently) 
   Most surprisingly, it seems there is a random amount of overall mallocs 
*/

#include <stdlib.h>             /* EXIT_FAILURE, EXIT_SUCCESS macros & the likes */
#include <stdio.h>              /* printf() & the likes */
#include <pthread.h>            /* test subject */

#define MAX_THREADS 100         /* Number of threads */
pthread_attr_t tattr;           /* Thread attribute */
pthread_t workers[MAX_THREADS]; /* All the threads spawned by the main() thread */

/* A mock container structure to pass arguments around */
struct args_for_job_t {
    int tid;
};

/* The job each worker will perform upon creation */
void *job(void *arg)
{
    /* Cast arguments in a proper container */
    struct args_for_job_t *container;
    container = (struct args_for_job_t *)arg;

    /* A mock job */
    printf("[TID - %d]\n", container->tid);

    /* For the sake of returning something, not necessary */
    return NULL;
}

int main ()
{
    int return_code;                            /* Will hold return codes */
    int tid;                                    /* Thread id */
    struct args_for_job_t args[MAX_THREADS];    /* For thread safeness */

    /* Initialize and set thread detached attribute */
    pthread_attr_init(&tattr);
    pthread_attr_setdetachstate(&tattr, PTHREAD_CREATE_DETACHED);

    /* Spawn detached threads */
    for (tid = 0; tid < MAX_THREADS; tid++)
    {
        args[tid].tid = tid;
        return_code = pthread_create(&workers[tid], &tattr, job, (void *)(&args[tid]));
        if (return_code != 0) { printf("[ERROR] Thread creation failed\n"); return EXIT_FAILURE; }
    }

    /* Free thread attribute */
    pthread_attr_destroy(&tattr);

    /* Delay main() completion until all detached threads finish their jobs. */
    usleep(100000);
    return EXIT_SUCCESS;
}

Valgrind output for code snippet 1 (joined threads & mem-leak)

==27802== 
==27802== HEAP SUMMARY:
==27802==     in use at exit: 1,558 bytes in 4 blocks
==27802==   total heap usage: 105 allocs, 101 frees, 28,814 bytes allocated
==27802== 
==27802== Searching for pointers to 4 not-freed blocks
==27802== Checked 104,360 bytes
==27802== 
==27802== 36 bytes in 1 blocks are still reachable in loss record 1 of 4
==27802==    at 0x4C2B6CD: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==27802==    by 0x400894D: _dl_map_object (dl-load.c:162)
==27802==    by 0x401384A: dl_open_worker (dl-open.c:225)
==27802==    by 0x400F175: _dl_catch_error (dl-error.c:178)
==27802==    by 0x4013319: _dl_open (dl-open.c:639)
==27802==    by 0x517F601: do_dlopen (dl-libc.c:89)
==27802==    by 0x400F175: _dl_catch_error (dl-error.c:178)
==27802==    by 0x517F6C3: __libc_dlopen_mode (dl-libc.c:48)
==27802==    by 0x4E423BB: pthread_cancel_init (unwind-forcedunwind.c:53)
==27802==    by 0x4E4257B: _Unwind_ForcedUnwind (unwind-forcedunwind.c:130)
==27802==    by 0x4E4069F: __pthread_unwind (unwind.c:130)
==27802==    by 0x4E3AFF4: pthread_exit (pthreadP.h:265)
==27802== 
==27802== 36 bytes in 1 blocks are still reachable in loss record 2 of 4
==27802==    at 0x4C2B6CD: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==27802==    by 0x400B7EC: _dl_new_object (dl-object.c:161)
==27802==    by 0x4006805: _dl_map_object_from_fd (dl-load.c:1051)
==27802==    by 0x4008699: _dl_map_object (dl-load.c:2568)
==27802==    by 0x401384A: dl_open_worker (dl-open.c:225)
==27802==    by 0x400F175: _dl_catch_error (dl-error.c:178)
==27802==    by 0x4013319: _dl_open (dl-open.c:639)
==27802==    by 0x517F601: do_dlopen (dl-libc.c:89)
==27802==    by 0x400F175: _dl_catch_error (dl-error.c:178)
==27802==    by 0x517F6C3: __libc_dlopen_mode (dl-libc.c:48)
==27802==    by 0x4E423BB: pthread_cancel_init (unwind-forcedunwind.c:53)
==27802==    by 0x4E4257B: _Unwind_ForcedUnwind (unwind-forcedunwind.c:130)
==27802== 
==27802== 312 bytes in 1 blocks are still reachable in loss record 3 of 4
==27802==    at 0x4C29DB4: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==27802==    by 0x4010B59: _dl_check_map_versions (dl-version.c:300)
==27802==    by 0x4013E1F: dl_open_worker (dl-open.c:268)
==27802==    by 0x400F175: _dl_catch_error (dl-error.c:178)
==27802==    by 0x4013319: _dl_open (dl-open.c:639)
==27802==    by 0x517F601: do_dlopen (dl-libc.c:89)
==27802==    by 0x400F175: _dl_catch_error (dl-error.c:178)
==27802==    by 0x517F6C3: __libc_dlopen_mode (dl-libc.c:48)
==27802==    by 0x4E423BB: pthread_cancel_init (unwind-forcedunwind.c:53)
==27802==    by 0x4E4257B: _Unwind_ForcedUnwind (unwind-forcedunwind.c:130)
==27802==    by 0x4E4069F: __pthread_unwind (unwind.c:130)
==27802==    by 0x4E3AFF4: pthread_exit (pthreadP.h:265)
==27802== 
==27802== 1,174 bytes in 1 blocks are still reachable in loss record 4 of 4
==27802==    at 0x4C29DB4: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==27802==    by 0x400B57D: _dl_new_object (dl-object.c:77)
==27802==    by 0x4006805: _dl_map_object_from_fd (dl-load.c:1051)
==27802==    by 0x4008699: _dl_map_object (dl-load.c:2568)
==27802==    by 0x401384A: dl_open_worker (dl-open.c:225)
==27802==    by 0x400F175: _dl_catch_error (dl-error.c:178)
==27802==    by 0x4013319: _dl_open (dl-open.c:639)
==27802==    by 0x517F601: do_dlopen (dl-libc.c:89)
==27802==    by 0x400F175: _dl_catch_error (dl-error.c:178)
==27802==    by 0x517F6C3: __libc_dlopen_mode (dl-libc.c:48)
==27802==    by 0x4E423BB: pthread_cancel_init (unwind-forcedunwind.c:53)
==27802==    by 0x4E4257B: _Unwind_ForcedUnwind (unwind-forcedunwind.c:130)
==27802== 
==27802== LEAK SUMMARY:
==27802==    definitely lost: 0 bytes in 0 blocks
==27802==    indirectly lost: 0 bytes in 0 blocks
==27802==      possibly lost: 0 bytes in 0 blocks
==27802==    still reachable: 1,558 bytes in 4 blocks
==27802==         suppressed: 0 bytes in 0 blocks
==27802== 
==27802== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 2 from 2)
--27802-- 
--27802-- used_suppression:      2 dl-hack3-cond-1
==27802== 
==27802== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 2 from 2)

Valgrind output for code snippet 1 (no mem-leak, a few runs later)

--29170-- Discarding syms at 0x64168d0-0x6426198 in /lib/x86_64-linux-gnu/libgcc_s.so.1 due to munmap()
==29170== 
==29170== HEAP SUMMARY:
==29170==     in use at exit: 0 bytes in 0 blocks
==29170==   total heap usage: 105 allocs, 105 frees, 28,814 bytes allocated
==29170== 
==29170== All heap blocks were freed -- no leaks are possible
==29170== 
==29170== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 2 from 2)
--29170-- 
--29170-- used_suppression:      2 dl-hack3-cond-1
==29170== 
==29170== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 2 from 2)
Community
  • 1
  • 1
swappy
  • 108
  • 1
  • 6
  • What's the valgrind output? (Aside: There's no need to use `pthread_exit` anywhere in that code, you can just `return 0;` instead.) – Jonathan Wakely Mar 04 '13 at 18:27
  • Applied your correction with return 0; instead of pthread_exit() in the main(). Also, I added the valgrind output for the joined() code snippet, increasing the completion time for the main() as suggested by nos cleared my second and third questions. – swappy Mar 05 '13 at 09:25
  • That's not really a memory leak, all memory is still reachable – Jonathan Wakely Mar 05 '13 at 09:31
  • Apologies, my comment about `pthread_exit` was only relevant to snippet 1, where it's useless (because you join the threads). In snippets 2 and 3 calling it at the end of `main` is correct – Jonathan Wakely Mar 05 '13 at 09:36
  • Agreed, it is not a memory leak in the true sense of it, it is not an error at all, but what bugs me is why on certain runs everything is freed properly while on most it is not. – swappy Mar 05 '13 at 09:37

1 Answers1

5

You have a bug in when your threads are detached, causing undefined behavior.

In main you have this line of code:

struct args_for_job_t args[MAX_THREADS];

Which you hand of pointers to your worker threads.

Then main() reaches this part

pthread_exit(NULL);

And main() ceases to exist, but you still may have worker threads around, that accesses the above args array that's on the stack of main() - which doesn't exist anymore. Your worker threads might all finish before main() ends in some runs, but not in other runs.

nos
  • 223,662
  • 58
  • 417
  • 506
  • 2
    Thanks nos,I was suspecting such a behaviour but wanted to double-check. Thing is, I also tried adding a timer (usleep()) before pthread_exit(NULL) (or return 0 as Jonathan suggested) and it still behaved randomly. I was living under the impression that using pthread_exit() instead of return will tell the main() thread to hang until all workers are done (exactly what pthread_join() does). – swappy Mar 05 '13 at 09:05
  • I stand corrected, I bumped the sleep time from 10 000 to 100 000 with detached threads and it seems all memory leaks are gone, this leaves enough time to all threads to complete before the main() terminates. – swappy Mar 05 '13 at 09:24
  • Sorry, you're right that the `pthread_exit` in `main` causes it to wait. The calls to `pthread_exit` in the other threads and in snippet 1 are redundant – Jonathan Wakely Mar 05 '13 at 09:33
  • Well it does not seem to cause it to wait in the main() when dealing with detached threads, if I increase the wait time to "force" it to wait there are no errors. The reason I left pthread_exit() in the detached threads is to minimise code changes from one version to another, but I agree it is redundant and I will fix this in the snippets. Thanks! – swappy Mar 05 '13 at 09:39
  • Ahmm actually I realised I did remove them in the posted version, I added "return NULL;" just for the sake of it, but it is indeed redundant. I still want to keep the pthread_exit() in the joined version as I will rely on the return status to do other work. – swappy Mar 05 '13 at 09:42
  • @swappy It not just an issue of the main() thread hanging around. It's the main() function that ends. pthread_exit() may very well have the magic to make it behave as you did a normal return 0; from main, and the stack may be destroyed, or reused for cleanup code that runs when main is done etc. I.e., effect may be similar to returning a pointer to a local variable from a function. I'd also not be surprised if there's different behavior for what happens when main ends if you compile and link the program with the `-pthread` flag vs just linking in pthread with `-lpthread` in such a case. – nos Mar 05 '13 at 12:03
  • Aha I understand now. So if I want to pass arguments around to detached threads, they have to be either on the heap or declared as global variables, right? I currently compile it with -pthread, I'll see what happens when I link it. Thanks for the hints so far nos! – swappy Mar 05 '13 at 12:24
  • @swappy It's ok to pass around stack data, as long as you make sure that memory is available when the threads need them. But in many cases that's not so straight forward, and heap allocated data may make things easier. – nos Mar 05 '13 at 12:43
  • Thanks for all these explanations nos! This leaves one last query, which is why when I deal with joinable threads does the system sometimes release the allocated data before completion and some other times it keeps some memory reachable? Running code snippet 1 several times through valgrind will sometimes output no errors and that is regardless of the stack size assigned to new threads, I tried playing with pthread_attr_setstacksize(). – swappy Mar 05 '13 at 12:53
  • @swappy The code snippet 1 should as far as I can see be fine, I cannot reproduce any errors or leaks with it here, using Fedora. I suspect those are false positives or something that's fixed in versions . – nos Mar 05 '13 at 13:15
  • I wonder why there is such a difference between systems, I wonder what triggers the OS to behave in such a way. If anyone has an idea, I'm interested. Either way, I'll regard this question as solved. Thanks again both nos & Jonathan for the help. – swappy Mar 06 '13 at 10:26