Why running std::thread with empty function spend a lot of memory

Question

I wrote a simple program which should run two threads, sort small array (~4096 Byte) and write into an output file. Input data contain in the one big file (~4Gb). Computer has 128MB memory. I found that running just empty main function use 14MB memory. If run std::thread with empty function application start to use ~8MB per thread. BUT if i make just one dynamic memory allocation program starts to use approximately 64Mb per thread. I don't understand what can spend so much memory. How can I control this size? And how allocate dynamic memory to minimize some system default allocation?

System: Ubuntu 14.04.3
Compiler: gcc 4.8.4
Compiler option:'-std=c++11 -O3 -pthread'

This is a code example

void dummy(void)
{
    std::vector<unsigned int> g(1);
    int i = 0;
    while( i<500000000)
    {
        ++i;
    }
}

int main(void)
{
    std::thread t1(&dummy);
    std::thread t2(&dummy);
    std::thread t3(&dummy);
    t1.join();
    t2.join();
    t3.join();
    return 0;
}

I do not trust your measures. How did you find out the memory footprint? — SergeyA, Jan 31 '17 at 19:48
I read /proc/.../status file and read VmSize value. And I use pmap utils too. — Miltiad, Jan 31 '17 at 19:51
Even if the thread function does nothing the OS may allocate a stack. 8 MB is not out of line with reality. The 64 MB... That's a bit weird. — user4581301, Jan 31 '17 at 19:54
@user4581301 I could see allocation address space, but why actual pages? Oh wait, OP is reading **VmSize**? That seems like a strange thing to be measuring. Why do you care if your process asks for a gigabyte of address space then never touches any of it? VM space of untouched memory should be cheap, unless your hardware doesn't support untouched VM pages not being physically allocated or somesuch. — Yakk - Adam Nevraumont, Jan 31 '17 at 20:09

score 8 · Answer 1 · edited May 23 '17 at 10:29

Every thread has its own stack. On Linux, the default stack size is 8 MB. When you start allocating memory for the first time, the heap memory allocator might actually reserve a big chunk up front. This might explain the 64 MB per thread you are seeing.

That said, when I say "allocated", that doesn't mean that this memory is really used. The allocation happens in the virtual memory space of the process. This is what you see under the column VSZ when you run ps or under the column VIRT when you run top. But Linux knows that you probably are not going to use most of that allocated memory anyway. So while you have allocated a chunk of virtual memory, Linux does not allocate any physical memory to back that up, until the process actually starts writing to that memory. The real physical amount of memory used by a process is seen under RSS for ps and RES for top. Linux allows more virtual memory to be allocated than there is physical memory in total.

Even though you might not run out of physical memory, if you have a lot of threads on a 32-bit system, each of which is allocating 8 MB of virtual memory, you might run out of the virtual memory space of your process (which is in the order of 2 GB). While C++'s thread library does not allow you to change the size of the stack, the C pthreads library allows you to do this by supplying pthread_create() with a pthread_attr_t which you adjusted using pthread_attr_setstacksize(). See also this stackoverflow question.

Given that the system has a total of 128 MB RAM, it's unlikely that your process will run out of address space. Linux will likely run out of RAM first. — MSalters, Feb 01 '17 at 10:30
@MSalters: Linux overcommits memory, so it certainly can run out of address space before running out of RAM. — G. Sliepen, Feb 01 '17 at 19:27
In theory it can, but with about 80 MB of RAM available (20K pages) and 2048 MB of address space (512K pages), you'd need to have a massive overcommit. — MSalters, Feb 02 '17 at 09:08
There is no limit to how much Linux overcommits. It is very easy to allocate more address space than RAM. For example, if you write network daemon that creates one thread for each incoming connection, then if you have 80 MB of RAM, you would only need 10 incoming connections to use up all of that just for the stacks of those threads. Luckily, because of the overcommitting, your program will just continue to work fine. — G. Sliepen, Feb 02 '17 at 15:07

Colin V. · Answer 2 · 2017-01-31T20:41:56.663

The value that you reported for ulimit -s in the comments above does suggest that the thread is still allocating a stack even if it is an empty main. The function call that is executed in the thread would require a stack to pass a return address assuming you're on x86.

@Karrek SB is heading in the right direction with this. The allocator that you are using can affect the heap size for your program. In order to avoid repeated calls to brk or sbrk, allocators will usually request larger initial blocks of memory. It isn't unreasonable to expect values in the order of MB -- especially values that align nicely along typical page boundaries such as 4, 8, 32, 64, etc. when the allocator is first initialized.

To control how much memory is allocated, your results may vary. See if your allocator supports the mallopt function. With a bit of trial and error, you may be able to reduce your overall memory footprint. Else, you could always implement your own allocator.

Why running std::thread with empty function spend a lot of memory

2 Answers2