14

Valgrind is reporting leaked blocks, apparently one per thread, in the following code:

#include <iostream>
#include <thread>
#include <mutex>
#include <list>
#include <chrono>

std::mutex cout_mutex;

struct Foo
{
    Foo() 
    { 
        std::lock_guard<std::mutex> lock( cout_mutex );
        std::cout << __PRETTY_FUNCTION__ << '\n'; 
    }

    ~Foo() 
    { 
        std::lock_guard<std::mutex> lock( cout_mutex );
        std::cout << __PRETTY_FUNCTION__ << '\n'; 
    }

    void 
    hello_world() 
    { 
        std::lock_guard<std::mutex> lock( cout_mutex );
        std::cout << __PRETTY_FUNCTION__ << '\n'; 
    }
};

void
hello_world_thread()
{
    thread_local Foo foo;

    // must access, or the thread local variable may not be instantiated
    foo.hello_world();

    // keep the thread around momentarily
    std::this_thread::sleep_for( std::chrono::milliseconds( 100 ) );
}

int main()
{
    for ( int i = 0; i < 100; ++i )
    {
        std::list<std::thread> threads;

        for ( int j = 0; j < 10; ++j )
        {
            std::thread thread( hello_world_thread );
            threads.push_back( std::move( thread ) );
        }

        while ( ! threads.empty() )
        {
            threads.front().join();
            threads.pop_front();
        }
    }
}

Compiler version:

$ g++ --version
g++ (GCC) 4.8.1
Copyright (C) 2013 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

GCC build options:

--enable-shared
--enable-threads=posix
--enable-__cxa_atexit
--enable-clocale=gnu
--enable-cxx-flags='-fno-omit-frame-pointer -g3'
--enable-languages=c,c++
--enable-libstdcxx-time=rt
--enable-checking=release
--enable-build-with-cxx
--disable-werror
--disable-multilib
--disable-bootstrap
--with-system-zlib

Program compilation options:

g++ -std=gnu++11 -Og -g3 -Wall -Wextra -fno-omit-frame-pointer thread_local.cc

valgrind version:

$ valgrind --version
valgrind-3.8.1

Valgrind options:

valgrind --leak-check=full --verbose ./a.out > /dev/null

Tail-end of valgrind output:

==1786== HEAP SUMMARY:
==1786==     in use at exit: 24,000 bytes in 1,000 blocks
==1786==   total heap usage: 3,604 allocs, 2,604 frees, 287,616 bytes allocated
==1786== 
==1786== Searching for pointers to 1,000 not-freed blocks
==1786== Checked 215,720 bytes
==1786== 
==1786== 24,000 bytes in 1,000 blocks are definitely lost in loss record 1 of 1
==1786==    at 0x4C29969: operator new(unsigned long, std::nothrow_t const&) (vg_replace_malloc.c:329)
==1786==    by 0x4E8E53E: __cxa_thread_atexit (atexit_thread.cc:119)
==1786==    by 0x401036: hello_world_thread() (thread_local.cc:34)
==1786==    by 0x401416: std::thread::_Impl<std::_Bind_simple<void (*())()> >::_M_run() (functional:1732)
==1786==    by 0x4EE4830: execute_native_thread_routine (thread.cc:84)
==1786==    by 0x5A10E99: start_thread (pthread_create.c:308)
==1786==    by 0x573DCCC: clone (clone.S:112)
==1786== 
==1786== LEAK SUMMARY:
==1786==    definitely lost: 24,000 bytes in 1,000 blocks
==1786==    indirectly lost: 0 bytes in 0 blocks
==1786==      possibly lost: 0 bytes in 0 blocks
==1786==    still reachable: 0 bytes in 0 blocks
==1786==         suppressed: 0 bytes in 0 blocks
==1786== 
==1786== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 2 from 2)
--1786-- 
--1786-- used_suppression:      2 dl-hack3-cond-1
==1786== 
==1786== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 2 from 2)

Constructors and destructors were run once for each thread:

$ ./a.out | grep 'Foo::Foo' | wc -l
1000

$ ./a.out | grep hello_world | wc -l
1000

$ ./a.out | grep 'Foo::~Foo' | wc -l
1000

Notes:

  • If you change the number of threads created, the number of leaked blocks matches the number of threads.
  • The code is structured in such a way that might permit resource reuse (i.e. the leaked block) if GCC were so implemented.
  • From the valgrind stacktrace, thread_local.cc:34 is the line: thread_local Foo foo;
  • Due to the sleep_for() call, a program run takes about 10 seconds or so.

Any idea if this memory leak is in GCC, a result of my config options, or is some bug in my program?

Stephen Croll
  • 143
  • 1
  • 6
  • 4
    This is a shining example of a perfectly composed question, well done. – GManNickG Jul 16 '13 at 05:48
  • Why do you use `static`? – stefan Jul 16 '13 at 07:36
  • Fixed. I re-ran valgrind, but unfortunately the leak still exists. – Stephen Croll Jul 16 '13 at 07:49
  • @user2224952 Well I didn't say it would fix the problem, I just thought it to be unnecessary ;-) – stefan Jul 16 '13 at 07:50
  • 1
    @user2224952 I've reduced the code to a minimal example showing the exact same behaviour. Unfortunately ideone does not use g++-4.8, but you can copy the code easily http://ideone.com/0SEkim Removing the destructor of Foo resolves the issue. – stefan Jul 16 '13 at 08:01
  • 1
    @stefan Well, removing the destructor is not actually a resolution, is it? That is rather limiting if you expect to do any sort of cleanup on thread exit. That solution restricts the types of data members of the thread_local class as well. For example, remove the user-specified destructor, then add a std::string member, and the leak is back. – Stephen Croll Jul 16 '13 at 08:32
  • @user2224952 Correct, it's not a solution you want to work with, but this is the point where we have to look deeper. I'd say either it is a restriction of `thread_local` or a bug in `g++`. I doubt that it's `thread_local` as it would make this keyword rather useless. – stefan Jul 16 '13 at 08:52
  • Absolutely shortest failing code: http://ideone.com/21zzwq This is without any libraries – stefan Jul 16 '13 at 08:58
  • Note: Added the valgrind version (3.8.1), in case that's useful. – Stephen Croll Jul 16 '13 at 09:01

2 Answers2

3

It seems that the leak comes from the dynamic initialization.

Here is an example with an int :

thread_local int num=4; //static initialization

The last example does not leak. I tried it with 2 threads and no leak at all.

But now :

int func()
{
    return 4;
}
thread_local int num2=func(); //dynamic initialization

This one leak ! With 2 threads it gives total heap uage: 8 allocs, 6 frees, 428 bytes allocated...

I would suggest you to use a workaround like :

thread_local Foo *foo = new Foo; //dynamic initialization

No forget at the end of the thread execution to do :

delete foo;

But the last example as one problem : What if the thread exit with error before your delete ? Leak again...

It seems that there is no great solution. Maybe we should report that to the g++ developers about that ?

Pierre Fourgeaud
  • 14,290
  • 1
  • 38
  • 62
0

try removing thread_local and using the following code

    void
    hello_world_thread()
    {
        Foo foo;

        // must access, or the thread local variable may not be instantiated
        foo.hello_world();

        // keep the thread around momentarily
        std::this_thread::sleep_for( std::chrono::milliseconds( 100 ) );
    }

foo within hello_world_thread should be in the local stack for every thread. so every thread will maintain its own copy of foo. no need to explicitly marking it as thread_local. A thread_local should be used in a context when you have something like static or namespace level variable but you want each variable to maintain its own copy for every thread.

Regards Kajal

Kajal Sinha
  • 1,565
  • 11
  • 20
  • The actual use-case might be something more like: `Foo& foo() { thread_local Foo foo; return foo; } void hello_world_thread() { if ( some_condition ) { foo().hello_world(); } }` The thread local object would only be instantiated (and destroyed) if referenced by a thread. It appears I minimized my code example in the wrong place and added complexity where perhaps I shouldn't have (i.e. the thread construction behavior). – Stephen Croll Jul 16 '13 at 09:28