Template code bloat with unordered_map

Question

I wondered if unordered_map is implemented using type erasure, since an unordered_map<Key, A*> and unordered_map<Key, B*> can use exactly the same code (apart from casting, which is a no-op in machine code). That is, the implementation of both could be based on unordered_map<Key, void*> to save code size.

Update: This technique is commonly referred to as the Thin Template Idiom (Thanks to the commenters below for pointing that out).

Update 2: I would be particlarly interested in Howard Hinnant's opinion. Let's hope he reads this.

So I wrote this small test:

#include <iostream>

#if BOOST
# include <boost/unordered_map.hpp>
  using boost::unordered_map;
#else
# include <unordered_map>
  using std::unordered_map;
#endif

struct A { A(int x) : x(x) {} int x; };
struct B { B(int x) : x(x) {} int x; };

int main()
{
#if SMALL
    unordered_map<std::string, void*> ma, mb;
#else
    unordered_map<std::string, A*> ma;
    unordered_map<std::string, B*> mb;
#endif

    ma["foo"] = new A(1);
    mb["bar"] = new B(2);

    std::cout << ((A*) ma["foo"])->x << std::endl;
    std::cout << ((B*) mb["bar"])->x << std::endl;

    // yes, it leaks.
}

And determined the size of the compiled output with various settings:

#!/bin/sh

for BOOST in 0 1 ; do
    for OPT in 2 3 s ; do
        for SMALL in 0 1 ; do
            clang++ -stdlib=libc++ -O${OPT} -DSMALL=${SMALL} -DBOOST=${BOOST} map_test.cpp -o map_test
            strip map_test
            SIZE=$(echo "scale=1;$(stat -f "%z" map_test)/1024" | bc)
            echo boost=$BOOST opt=$OPT small=$SMALL size=${SIZE}K
        done
    done
done

It turns out, that with all settings I tried, lots of inner code of unordered_map seems to be instantiated twice:

With Clang and libc++:
          |   -O2   |   -O3   |   -Os
-DSMALL=0 |  24.7K  |  23.5K  |  28.2K
-DSMALL=1 |  17.9K  |  17.2K  |  19.8K


With Clang and Boost:
          |   -O2   |   -O3   |   -Os
-DSMALL=0 |  23.9K  |  23.9K  |  32.5K
-DSMALL=1 |  17.4K  |  17.4K  |  22.3K


With GCC and Boost:
          |   -O2   |   -O3   |   -Os
-DSMALL=0 |  21.8K  |  21.8K  |  35.5K
-DSMALL=1 |  16.4K  |  16.4K  |  26.2K

(With the compilers from Apple's Xcode)

Now to the question: Is there some convincing technical reason due to which the implementers have chosen to omit this simple optimization?

Also: why the hell is the effect of -Os exactly the opposite of what is advertised?

Update 3:

As suggested by Nicol Bolas, I have repeated the measurements with shared_ptr<void/A/B> instead of naked pointers (created with make_shared and cast with static_pointer_cast). The tendency in the results is the same:

With Clang and libc++:
          |   -O2   |   -O3   |   -Os
-DSMALL=0 |  27.9K  |  26.7K  |  30.9K
-DSMALL=1 |  25.0K  |  20.3K  |  26.8K


With Clang and Boost:
          |   -O2   |   -O3   |   -Os
-DSMALL=0 |  35.3K  |  34.3K  |  43.1K
-DSMALL=1 |  27.8K  |  26.8K  |  32.6K

'Type-erasure' tends to refer to a very particular technique when it comes to C++, and that's not it here. As far as I can remember, there is no particular name for the technique of writing a partial specialization in terms of another specialization (or a common implementation) although it is a known technique. (Someone did suggest "thin template" though.) — Luc Danton, Jun 25 '12 at 12:45
Well, I call it type erasure because the type of the pointer is essentialy erased under the hood to allow for a common implementation. See also [this question](http://stackoverflow.com/questions/5450159/type-erasure-techniques). — marton78, Jun 25 '12 at 13:02
I'd only call it "type erasure" if the type is *visibly* erased (see `function` and `shared_ptr`), which would not be the case here. — Xeo, Jun 25 '12 at 18:05
Have you considered that it is generally not a good idea to store naked pointers in standard library containers, due to lifetime and ownership issues? Good modern C++ programming practice suggests using a `unique_ptr`, `shared_ptr`, or some other form of smart pointer instead of a naked pointer. At which point your thin template "optimization" is of zero value. — Nicol Bolas, Jun 25 '12 at 19:54
Weird, but I am satisfied (and not very surprised) to see that the executable size is *always smaller* when compiled with optimisation for speed than when compiled with optimisation for size. — Konrad Rudolph, Jun 25 '12 at 19:56
@NicolBolas: Sure, smart pointers are better, but for the sake of the argument I simplifyed the question. The same optimization is still possible with smart pointers, just base your implementation on `some_ptr` and use `static_pointer_cast`. See above. — marton78, Jun 26 '12 at 08:44
@KonradRudolph: Interesting. Why is that not suprising? Also, why is the result of `-O3` even smaller than that of `-O2` (which is supposed to turn on all optimizations that don't increase code size)? — marton78, Jun 26 '12 at 08:48
@marton78: "The same optimization is still possible with smart pointers" You cannot cast `unique_ptr`s. For obvious reasons. Furthermore, you would have to do this for every kind of smart pointer you would want to use. `std::shared_ptr`, `boost::shared_ptr`, etc. After a while, you have to stop guessing what the user's going to do. — Nicol Bolas, Jun 26 '12 at 16:19
@NicolBolas: Sure, you cannot cast `unique_ptr`s but you _can_ cast the underlying raw pointer. You are right of course that it would have to be done for all smart pointer types, but this can be accomplished with template-templates and and `enable_if` to detect (supported) smart pointers. Again, this is not something to make the library implementer's life easier, but rather that of the library user. In my application I _do_ need `unordered_map`s of many different pointers types. — marton78, Jun 26 '12 at 17:38
@marton78: "Again, this is not something to make the library implementer's life easier, but rather that of the library user." Yes it is. But there are *limits* that the implementation is going to go to for these kinds of cases. They're not going to implement this optimization across all *twelve* containers. All for what, 7KB per instantiation? Personally, I'd say that if you have an explicit need for this optimization, you can write (and maintain) a wrapper class template yourself. — Nicol Bolas, Jun 26 '12 at 17:54

score 10 · Accepted Answer · answered Jul 03 '12 at 22:20

Since I've been specifically asked to comment, I will, though I'm not sure I have much more to add than has already been said. (sorry it took me 8 days to get here)

I've implemented the thin template idiom before, for some containers, namely vector, deque and list. I don't currently have it implemented for any container in libc++. And I've never implemented it for the unordered containers.

It does save on code size. It also adds complexity, much more so than the referenced wikibooks link implies. One can also do it for more than just pointers. You can do it for all scalars which have the same size. For example why have different instantiations for int and unsigned? Even ptrdiff_t can be stored in the same instantiation as T*. After all, it is all just a bag bits at the bottom. But it is extremely tricky to get the member templates which take a range of iterators correct when playing these tricks.

There are disadvantages though (besides difficulty of implementation). It doesn't play nearly as nicely with the debugger. At the very least it makes it much more difficult for the debugger to display container innards. And while the code size savings can be significant, I would stop short of calling the code size savings dramatic. Especially when compared to the memory required to store the photographs, animations, audio clips, street maps, years of email with all of the attachments from your best friends and family, etc. I.e. optimizing code size is important. But you should take into account that in many apps today (even on embedded devices), if you cut your code size in half, you might cut your app size by 5% (statistics admittedly pulled from thin air).

My current position is that this particular optimization is one best paid for and implemented in the linker instead of in the template container. Though I know this isn't easy to implement in the linker, I have heard of successful implementations.

That being said, I still do try to make code size optimizations in templates. For example in the libc++ helper structures such as __hash_map_node_destructor are templated on as few parameters as possible, so if any of their code gets outlined, it is more likely that one instantiation of the helper can serve more than one instantiation of unordered_map. This technique is debugger friendly, and not that hard to get right. And can even have some positive side effects for the client when applied to iterators (N2980).

In summary, I wouldn't hold it against code for going the extra mile and implementing this optimization. But I also wouldn't classify it as high a priority as I did a decade ago, both because linker technology has progressed, and the ratio of code size to application size has tended to fairly dramatically decrease.

score -1 · Answer 2 · answered Jun 25 '12 at 11:17

-1

When you have a void* parameter there is no type checking at compile-time.

Such maps as those you propose would be a flaw in a program since they would accept value elements of type A*, B*, and even more unimaginable fancy types that would have nothing to do in that map. ( for example int*, float*; std::string*, CString*, CWnd*... imagine the mess in your map...)

Your optimisation is premature. And premature optimization is root of all evil.

answered Jun 25 '12 at 11:17

Stephane Rolland

38,876
35
121
169

1

In user code, I agree. There premature optimization is evil. But in a library, the authors should make sure their code is efficient. And regarding the hole in the type system: Of course the library author could hide it by having `unordered_map` instantiate some hidden `detail::unordered_map_impl` under the hood. – marton78 Jun 25 '12 at 11:19
7

The type checking is performed by a [thin template](http://en.wikibooks.org/wiki/More_C%2B%2B_Idioms/Thin_Template) that simply delegates to a common base class which is not type-safe. All the user see is the template facade, so there is no flaw regarding static typing. (I think Scott Meyers wrote an item about that in Effective or More Effective C++, but I don't have them at hand to check.) – Luc Touraille Jun 25 '12 at 12:08
Thanks for the link. Although having done that myself, I didn't know the name of this idiom was Thin Template. – marton78 Jun 25 '12 at 15:15

Template code bloat with unordered_map

2 Answers2