I wondered if unordered_map
is implemented using type erasure, since an unordered_map<Key, A*>
and unordered_map<Key, B*>
can use exactly the same code (apart from casting, which is a no-op in machine code). That is, the implementation of both could be based on unordered_map<Key, void*>
to save code size.
Update: This technique is commonly referred to as the Thin Template Idiom (Thanks to the commenters below for pointing that out).
Update 2: I would be particlarly interested in Howard Hinnant's opinion. Let's hope he reads this.
So I wrote this small test:
#include <iostream>
#if BOOST
# include <boost/unordered_map.hpp>
using boost::unordered_map;
#else
# include <unordered_map>
using std::unordered_map;
#endif
struct A { A(int x) : x(x) {} int x; };
struct B { B(int x) : x(x) {} int x; };
int main()
{
#if SMALL
unordered_map<std::string, void*> ma, mb;
#else
unordered_map<std::string, A*> ma;
unordered_map<std::string, B*> mb;
#endif
ma["foo"] = new A(1);
mb["bar"] = new B(2);
std::cout << ((A*) ma["foo"])->x << std::endl;
std::cout << ((B*) mb["bar"])->x << std::endl;
// yes, it leaks.
}
And determined the size of the compiled output with various settings:
#!/bin/sh
for BOOST in 0 1 ; do
for OPT in 2 3 s ; do
for SMALL in 0 1 ; do
clang++ -stdlib=libc++ -O${OPT} -DSMALL=${SMALL} -DBOOST=${BOOST} map_test.cpp -o map_test
strip map_test
SIZE=$(echo "scale=1;$(stat -f "%z" map_test)/1024" | bc)
echo boost=$BOOST opt=$OPT small=$SMALL size=${SIZE}K
done
done
done
It turns out, that with all settings I tried, lots of inner code of unordered_map
seems to be instantiated twice:
With Clang and libc++:
| -O2 | -O3 | -Os
-DSMALL=0 | 24.7K | 23.5K | 28.2K
-DSMALL=1 | 17.9K | 17.2K | 19.8K
With Clang and Boost:
| -O2 | -O3 | -Os
-DSMALL=0 | 23.9K | 23.9K | 32.5K
-DSMALL=1 | 17.4K | 17.4K | 22.3K
With GCC and Boost:
| -O2 | -O3 | -Os
-DSMALL=0 | 21.8K | 21.8K | 35.5K
-DSMALL=1 | 16.4K | 16.4K | 26.2K
(With the compilers from Apple's Xcode)
Now to the question: Is there some convincing technical reason due to which the implementers have chosen to omit this simple optimization?
Also: why the hell is the effect of -Os
exactly the opposite of what is advertised?
Update 3:
As suggested by Nicol Bolas, I have repeated the measurements with shared_ptr<void/A/B>
instead of naked pointers (created with make_shared
and cast with static_pointer_cast
). The tendency in the results is the same:
With Clang and libc++:
| -O2 | -O3 | -Os
-DSMALL=0 | 27.9K | 26.7K | 30.9K
-DSMALL=1 | 25.0K | 20.3K | 26.8K
With Clang and Boost:
| -O2 | -O3 | -Os
-DSMALL=0 | 35.3K | 34.3K | 43.1K
-DSMALL=1 | 27.8K | 26.8K | 32.6K