3

I read lot of SO posts regarding memory management in C++11 STL but I could not really find a satisfying answer.

My situation: I develop a long running server [it runs around 4-6 weeks]. For the moment I use lot of old C code char [x][y] or char [z] variables located on the stack.

I have my doubt if the STL memory management is still reliable using it extensively in a program which is running for weeks and serving in t̲h̲i̲s̲ period more than 10 million threads and each thread would have lot of STL operations.

To be more specific: I want to rewrite all fix sized variables located on stack to std::vector<std::string> or std::string type.

My questions:

  1. Can I completely safe rewrite my program to the new modern STL notation and get rid of the old C code?
  2. Is there any memory fragmentation when running for that long time in million of threads?
  3. What about the performance? Using the old C code having the variables on stack does not have any performance impact.

Compiler is gcc 4.9.3

Peter VARGA
  • 4,780
  • 3
  • 39
  • 75

2 Answers2

16

Can I completely safe rewrite my program to the new modern STL notation and get rid of the old C code?

First, STL is not new; it dates back to well before C++ itself was standardized. Second, we call it the C++ standard library.

Third, as long as your threads follow the requirements of C++ (ie: don't terminate in a way that C++ doesn't allow), and you don't leak memory, then yes, you'll be fine.

Is there any memory fragmentation when running for that long time in million of threads?

You're going from objects living on the stack to dynamically allocating memory. Of course there is the possibility of memory fragmentation.

That has absolutely nothing to do with C++ standard library containers. It's an outgrowth of using dynamic allocations.

Equally importantly, you could just use std::array<char, ...> if you want to use a nicer, fixed-size stack array. Then again, std::string implementations with small string optimization offer a pretty good compromise in a lot of cases, forgoing allocating memory if the string is below some maximum size.

What about the performance? Using the old C code having the variables on stack does not have any performance impact.

It made your stack longer, which given the 10 million threads, could have caused you to commit more pages of memory. Then again, maybe not.

In any case, memory allocation is always an issue when it comes to a hyper-threaded application. Memory allocation, by its nature, has to be reentrant. That means mutex locking and so forth.

You can devise atomic ways of allocating and deallocating memory, but that tends to require allocations of fixed sizes. And such things tend to have their own downsides. You could have thread-local memory pools that you allocate from. All of those require using your own memory allocators.

But most importantly of all... these issues again have nothing to do with using C++ standard library types specifically. This is simply what happens when you go from static memory to dynamic allocations. Whether you're using malloc/free or standard library containers, the issue is with dynamic allocations.

Quentin
  • 62,093
  • 7
  • 131
  • 191
Nicol Bolas
  • 449,505
  • 63
  • 781
  • 982
  • 2
    Good point about the C stack being _bigger_. With GCC now supporting [Small String Optimization](http://stackoverflow.com/questions/31228579/why-is-cow-stdstring-optimization-still-enabled-in-gcc-5-1), you will even get the best of both worlds: short strings on stack (no fragmentation issue), long strings on heap (keeps stack small). Not yet there for 4.9.3 unless you roll your own. – MSalters Jun 15 '16 at 20:47
  • 1
    @MSalters gcc/libstdc++ also has [`vstring`](http://stackoverflow.com/questions/10463851/what-is-gccs-vstring) if there's need for small string optimization. – dyp Jun 15 '16 at 21:28
  • OP was asking whether he can be sure fragmentation won't *gradually increase* over time until he runs out of memory. And it's legitimate to have concerns about the standard library's memory allocator. – einpoklum Jun 15 '16 at 21:41
  • @einpoklum: "*And it's legitimate to have concerns about the standard library's memory allocator.*" That doesn't change the facts that A) you don't have to use the standard library's allocator to use standard library containers, and B) those concerns are the result of relying on dynamic allocations, not of the standard library itself. You'd have those same concerns if you use `new` directly or `make_unique` pointers. So it has nothing to do with the reliability of the standard library itself. – Nicol Bolas Jun 15 '16 at 21:45
  • 2
    @einpoklum: "*OP was asking whether he can be sure fragmentation won't gradually increase over time until he runs out of memory.*" Fragmentation cannot cause you to run out of memory. It can only cause you to be unable to find sufficient contiguous free memory. Furthermore, there's no way to know, since it depends on the details of *exactly* what these millions of threads are doing. So that part of the question is effectively unanswerable. – Nicol Bolas Jun 15 '16 at 21:47
1

First, I am very grateful for all the comments and Nicol's answer. His last comment, regarding the fragmentation, hit the nail on the head.

1) The fragmentation depends on the details of exactly what these millions of threads are doing.

After analyzing deeply the project I realized there are millions of memory allocations and releases.

Therefore I wrote my own STL Memory Allocator which:

  1. has an internal unordered_map to maintain all the pointers.
  2. is multi-thread safe.
  3. According to point 1. it reuses freed pointers which are marked as free.
  4. to unload the Controller the requested memory size [by STL] is aligned to 16 bytes size.

My STL Memory Allocator loggs all requests and this is the summary [digest]:

Statistics:
     Total allocated Memory: 813'041'344 bytes
     Administrative Memory : 3'464'152 bytes
     Available pointers    : 2'500
+-------------------------------------------------------------------------+
| Index | Aligned Memory Size | Max Used Pointers | Total Requested Count |
+-------------------------------------------------------------------------+
|      1|                   48|                296|             49'545'399|
|      2|                   64|                469|             73'226'993|
|      3|                   80|              1'167|             67'108'769|
|      4|                   96|                129|             12'864'168|
|      5|                  112|                281|              4'528'422|
|      6|                  128|                 64|              8'715'454|
|      7|                  144|                 74|              5'148'202|
|     10|                  192|                387|              1'313'920|
|     11|                  208|                 26|              1'311'779|
|     13|                  272|                 56|             11'574'551|
|     15|                  352|                368|              1'178'994|
|     18|                  512|                262|              3'224'044|
|     22|                  656|                  5|              2'586'081|        
+-------------------------------------------------------------------------+

Legend:

  • Aligned Memory Size: Each requested block is aligned to 16 bytes + 32 bytes of maintaining data. E.q. allocating 1 bytes results in a memory block of real 48 bytes size.
  • Max Used Pointers: This is the amount how many memory blocks for this size has been used at the same time by all running threads. In other words, this memory [Size * Max Used Pointers] was physically allocated from the OS.
  • Total Requested Count: This count is increased each time an allocation [for the aligned size] is requested.

For me this means I can save definitely millions of allocations and releases and I don't know how the fragmentation would look like with the default STL allocator.


2) I could get rid of the old C style code and I can use the more convenient STL containers.


3) The performance is OK. For me it means my Allocator is not the fastest one but considering it is totally multi-thread safe and serving thousands requests per second it suffices totally for my needs.


So, the answer is, that [in fairness] I still don't know how reliable the default STL Memory Allocator is, but due to the mentioned facts I got - at least - a clue what is internally going on.

Supposing my Allocator is bug free [which I can expect after running for a long time and serving millions of requests] I can close this case for me.

Peter VARGA
  • 4,780
  • 3
  • 39
  • 75