1

I have declared a map object with (int, string). The string size is 128 bytes. However, a new Map nodes size remains constant 48 bytes. I have checked this with custom allocation.

std::map<int, std::string, std::less<int>, my_allocator< std::pair<const int, std::string> > >  custom_map;

gen_random(random_string, 103); //generates a random string of size 103 and stored in random_string
custom_map.emplace(i, std::string(random_string)); //allocates 48 bytes for map
                                          /* the string is allocated separately */

My question is - what does the map node hold? (I am assuming based on the behavior of the above code, some metadata for the red-black tree, the key, and a pointer to the value are held in the map node.)


Motivation:

I have a kernel module that manages persistent memory. It makes persistent memory accessible to applications by mapping them to applications address space. It also has support atomic msync.

I am trying to develop a simple persistent key value store using c++ STL map. So, I tried to create a custom allocator to allocate the stl::map objects from the persistent memory. I have a memory pool mapped from the persistent memory device, which is used by the custom allocator. So I need to make sure, everything related to the MAP (key, value, internal nodes) are allocated from that pool.

When I saw the Map object/node size is less than the (int, string) pair size, I got confused as I assumed everything(key+value) will be contained inside the map node that is allocated using the custom allocator. However, that is not the case. So I need to know about the MAP nodes setup to guarantee everything (Not more or less) related to the Map object is allocated from the persistent memory pool.

I hope it clears the motivation. Any suggestion is highly appreciated.

Basile Starynkevitch
  • 223,805
  • 18
  • 296
  • 547
mchow
  • 11
  • 4
  • 2
    A string is unlikely to have a size, as reported by sizeof. of 128 bytes - more likely something like 12. –  Feb 23 '17 at 18:06
  • 3
    I'm guessing it's mostly implementation dependent. BTW the std::string is constant-sized, it holds a pointer to a dynamically allocated char array. – Peter Lenkefi Feb 23 '17 at 18:07
  • 2
    It is implementation specific, and you should not care. If you compile with [GCC](http://gcc.gnu.org/) you could look into the source code (inside `` and the internal headers that it is including); also with [Clang](http://clang.llvm.org/) which has a *different* C++ standard library. – Basile Starynkevitch Feb 23 '17 at 18:07
  • 1
    You'll be surprised by the output of `cout << sizeof(std::string(random_string)) << endl;`, but understand if you think about it. –  Feb 23 '17 at 18:14
  • Thanks! @PeterLenkefi. It makes more sense now. `std::string s = "Test";` allocates 29 bytes. So, I am guessing 24 bytes for the string object and 5 for the char array. – mchow Feb 23 '17 at 18:17
  • If you care that much, you might consider re-implementing your own containers (or find some alternative). But ask yourself if it is worth the pain. – Basile Starynkevitch Feb 23 '17 at 18:39
  • **Why do you ask?** Are you coding for a constrained environment (e.g. some embedded device having too little RAM)? Or do you have billions of entries which don't fit on your server? You could easily guess-estimate (by running a few memory benchmarks) how much memory is consumed. In many cases, that should be enough. Otherwise, improve your question (and edit it) to give more motivations. – Basile Starynkevitch Feb 23 '17 at 19:14

2 Answers2

1

It is implementation specific. You can google and find e.g. this and that.

If you really care, look into the source code (e.g. of GCC and its libstdc++ or Clang, and its libc++). Their standard header (e.g. <map>) is including some implementation specific internal headers. You might pass -H (which prints the actually included files) to g++ (or clang++) to understand what actual internal headers are used.

Look also into <string>. It often does small string optimizations. See this (and that).

Internally, when you enable compiler optimizations, weird things might happen (e.g. generation of vector machine instructions à la SSE or AVX to speed up some string operations). And the internal headers might have tricks for them.

So if you really care (and assuming you are using a free software C++ implementation like GCC or Clang), dive into the implementation source code. Or look into the generated assembler code (from your particular source code), e.g. generated with g++ -O -fverbose-asm -S. Or look into the GIMPLE internal representation (using g++ -fdump-tree-gimple -O etc...).

(the internal headers might not contain very readable code and could use compiler magic or tricks, but compiler and standard C++ library implementors -practically, they are the same team- are doing tricky stuff for efficiency)

One of the interests of recent C++ is to take advantage of the abstractions provided and possible by its standard library. So why do you bother about implementation details? Can't you trust the implementation?

(If you care about implementation details, be sure to ask for optimizations with -O or -O2 at least, because the compiler is doing a lot of them)


Addenda, after motivations

Notice that it is not advised to code Linux kernel modules in C++, in particular because the kernel ABI and calling conventions are not compatible with those of GCC. There are many infelicities, e.g. exceptions (and the generated code contain stuff related to them) and other things. Be sure to look into the generated assembler code.

(if you dare coding kernel stuff in C++, be aware of all the details, including calling conventions and kernel ABI. And these details are more scary, and much more complex, than just the implementation of std::map)

If you are coding only a user-land library above specific kernel primitives (in C), you need to care about details like ASLR. Perhaps looking into existing application checkpointing frameworks (or persistence implementations) could be inspirational. And garbage collection has a lot of concepts & terminology in common with your persistent key store, so reading the GC handbook should be helpful.

Then my current advice is to avoid coding your thing in C++; you surely should care about implementation details, and C++ does a good job in hiding them to you. So I feel your approach is flawed, and certainly very brittle! I would recommend coding explicitly some red-black trees in C, without any C++ ; then you'll be aware of every implementation detail

Community
  • 1
  • 1
Basile Starynkevitch
  • 223,805
  • 18
  • 296
  • 547
  • I am using GCC. I mostly code in C though. I am trying to allocate all the heap memory from a custom memory pool for a research purpose. However C++ memory allocation is not that straight forward as I see it now. As far as I understand from your comments now, the allocator for map only allocates the node from heap, most probably it contains the key(or pointer to the key) and pointer to the value also. I have to check the headers to make sure. Highly appreciate your suggestion to explore the headers. Thanks! – mchow Feb 23 '17 at 18:37
  • 1
    I'm not expert with these headers (even if I know a bit of the internals of GCC). If you need to explore them, you should budget several weeks of work. Are you so sure it is worth spending such efforts? The advantage of C++ standard containers is to *abstract* these implementation details. – Basile Starynkevitch Feb 23 '17 at 18:41
  • Hmm, that's not a good news as I have a deadline coming up. I guess I have to explore them later. Thanks for the heads up! I got a few ideas from your comments and answers on what to do next. I will update the post when I can get to the bottom of it. – mchow Feb 23 '17 at 18:47
  • @mchow: please **edit your question** to explain why do you really care that much? What is your concrete issue? – Basile Starynkevitch Feb 23 '17 at 19:20
  • @mchow: I still hope you will explain why do you ask, by improving your question with some sentences giving your motivations. I feel you owe that to us. We did spend some time to give answers and links. So you really should explain why do you ask (and simple curiosity could be a motivation, but I hope it is not the only one). – Basile Starynkevitch Feb 23 '17 at 19:28
  • added motivation. And the kernel module is not developed in C++! I am developing the application using the interfaces from the kernel module. – mchow Feb 23 '17 at 20:08
  • Improved my answer after your edits. My advice for your case is to run away from C++ and code all your thing in plain C. If you use C++ you'll spend more time understanding the details of `std::map` than coding a red-black tree in C by yourself. That is why I strongly recommend you to avoid C++ – Basile Starynkevitch Feb 23 '17 at 20:20
  • Thanks, may be I will go back to C. I am certainly feeling the heat with C++ :) I know about ASLR and possible consequences, but that's another long discussion for another time may be! – mchow Feb 23 '17 at 20:25
1

First thing: std::string has a fixed size (implementation dependent). It usually holds a pointer to a dynamically allocated piece of memory (that you can access with the c_str() method).

Secondly, the internals of map are not specified, it is also implementation dependent. You can take a look at this minimal implementation.

C++ does not support dynamically sized structures (correct me if I'm wrong!) so every dynamic container has an underlying pointer somewhere.

Community
  • 1
  • 1
Peter Lenkefi
  • 1,306
  • 11
  • 29