40

I wish to create an alternative to std::type_index that does not require RTTI:

template <typename T>
int* type_id() {
    static int x;
    return &x;
}

Note that the address of the local variable x is used as the type ID, not the value of x itself. Also, I don't intend to use a bare pointer in reality. I've just stripped out everything not relevant to my question. See my actual type_index implementation here.

Is this approach sound, and if so, why? If not, why not? I feel like I am on shaky ground here, so I am interested in the precise reasons why my approach will or will not work.

A typical use case might be to register routines at run-time to handle objects of different types through a single interface:

class processor {
public:
    template <typename T, typename Handler>
    void register_handler(Handler handler) {
        handlers[type_id<T>()] = [handler](void const* v) {
            handler(*static_cast<T const*>(v));
        };
    }

    template <typename T>
    void process(T const& t) {
        auto it = handlers.find(type_id<T>());
        if (it != handlers.end()) {
            it->second(&t);
        } else {
            throw std::runtime_error("handler not registered");
        }
    }

private:
    std::map<int*, std::function<void (void const*)>> handlers;
};

This class might be used like so:

processor p;

p.register_handler<int>([](int const& i) {
    std::cout << "int: " << i << "\n";
});
p.register_handler<float>([](float const& f) {
    std::cout << "float: " << f << "\n";
});

try {
    p.process(42);
    p.process(3.14f);
    p.process(true);
} catch (std::runtime_error& ex) {
    std::cout << "error: " << ex.what() << "\n";
}

Conclusion

Thanks to everyone for your help. I have accepted the answer from @StoryTeller as he has outlined why the solution should be valid according the rules of C++. However, @SergeBallesta and a number of others in the comments have pointed out that MSVC performs optimizations which come uncomfortably close to breaking this approach. If a more robust approach is needed, then a solution using std::atomic may be preferable, as suggested by @galinette:

std::atomic_size_t type_id_counter = 0;

template <typename T>
std::size_t type_id() {
    static std::size_t const x = type_id_counter++;
    return x;
}

If anyone has further thoughts or information, I am still eager to hear it!

Joseph Thomson
  • 9,888
  • 1
  • 34
  • 38
  • 1
    @A.S.H I've added a typical use case. Obviously I've simplified it, but I have used this pattern in actual production code. – Joseph Thomson Jan 26 '17 at 07:20
  • Thanks for sharing. Please take a look at my comment to the answer of @galinette. since the `handlers` map is mainly used as an indirection tool for the execution, it is expected to be much faster if replaced by a vector, and using some auto-incremented enum for the types instead of some random pointers. – A.S.H Jan 26 '17 at 08:28
  • 1
    Side note : use an `unordered_map` instead of a `map`, the performance may vary a lot and you don't use any ordering – galinette Jan 26 '17 at 08:32
  • @A.S.H It would undoubtedly be faster, though you could end up with a bunch of unused entries in the `vector` if the `processor` wasn't the only thing using `type_id`. An `unordered_map` or a sorted `vector` would probably be faster than a `map`. In reality, the "handlers" would be doing far more work than they are in the example, so the lookup time for the routines would probably be negligible. – Joseph Thomson Jan 26 '17 at 08:33
  • If you really want to use pointer as type id, at least make it more clear in the code : do not return a pointer, use uintptr_t, which is the unsigned integer type which has the same size as a pointer type, or typedef it. And comment! – galinette Jan 26 '17 at 08:36
  • @galinette I admit that naming the local variable `id` is misleading, since the address is the actual ID, but I thought the question would be clear from the title. Of course I would comment this code in real life. And I chose not to cast to `uintptr_t` or `void*` because I wanted to keep things as simple as possible. I'd rather not risk invoking additional C++ voodoo if it isn't necessary. – Joseph Thomson Jan 26 '17 at 08:42
  • I've improved my answer with a (possibly hacky) way to secure that pattern against optimizing compilers... – Serge Ballesta Jan 26 '17 at 09:43
  • 1
    have you seen this stackoverflow question? http://stackoverflow.com/questions/7562096/compile-time-constant-id – Alessandro Teruzzi Jan 26 '17 at 10:59
  • This seems to be more suitable for code review. – BartoszKP Jan 26 '17 at 12:21
  • 3
    You can even use function pointer of an inlined template function as a type id. Just let the function return a pointer to itself. No need for the static variable. – kubanrob Jan 26 '17 at 12:57
  • @kubanrob That's a good idea, though it would have to be a different function. A function returning a pointer to itself will have an infinitely recursive return type! – Joseph Thomson Jan 26 '17 at 13:50
  • The type could just be a `void *`. – nwp Jan 26 '17 at 13:55
  • @nwp Function pointers cannot be cast to `void*` (https://isocpp.org/wiki/faq/pointers-to-members#cant-cvt-fnptr-to-voidptr). – Joseph Thomson Jan 26 '17 at 13:56
  • @JosephThomson You are right ... but afaik you can safely cast every function pointer to any other function pointer type. http://stackoverflow.com/a/11240372/5293824 – kubanrob Jan 26 '17 at 14:04
  • @kubanrob But it does say that the result of such a conversion is unspecified. While I doubt that anything unexpected would happen in practice, doesn't this mean that technically it might (e.g. you might get two different pointers from two separate `reinterpret_cast`s of the same pointer). – Joseph Thomson Jan 26 '17 at 14:28
  • @kubanrob Using just a function pointer would be far more risky on MSVC. It's very likely that all function template instantiations will be folded into one with the [`/opt:icf`](https://msdn.microsoft.com/en-us/library/bxwfs976.aspx) linker option. – bogdan Jan 26 '17 at 14:40
  • @bogdan I was just about to ask about that. Just gave it a test on MSVC, and it does indeed fold them all into one. Scratch that idea. – Joseph Thomson Jan 26 '17 at 14:53

4 Answers4

27

Yes, it will be correct to an extent. Template functions are implicitly inline, and static objects in inline functions are shared across all translation units.

So, in every translation unit, you will get the address of the same static local variable for the call to type_id<Type>(). You are protected here from ODR violations by the standard.

Therefore, the address of the local static can be used as a sort of home-brewed run-time type identifier.

Cody Gray - on strike
  • 239,200
  • 50
  • 490
  • 574
StoryTeller - Unslander Monica
  • 165,132
  • 21
  • 377
  • 458
  • 1
    The problem is not whether a type will always be represented by same address, but whether 2 different types could share same address... – Serge Ballesta Jan 26 '17 at 07:52
  • @SergeBallesta - `type_id()` and `type_id()` are completely different functions. I see no leeway in the standard that allows them to share the same static locals. – StoryTeller - Unslander Monica Jan 26 '17 at 07:53
  • 3
    @SergeBallesta - In fact [§14.8 ¶2](http://eel.is/c++draft/temp#fct.spec-2) – StoryTeller - Unslander Monica Jan 26 '17 at 07:59
  • 1
    I've read again 14.8.2 . It does declare that templace functions all have their own copy of the static variable, no problem on it. My question is whether an optimizing compiler can merge variables sharing the same value all along the program. – Serge Ballesta Jan 26 '17 at 09:08
  • 2
    @SergeBallesta Not if their addresses are observable ([\[intro.object\]/8](http://eel.is/c++draft/intro.object#8)). That said, I've seen MSVC with the [`/opt:icf`](https://msdn.microsoft.com/en-us/library/bxwfs976.aspx) linker option merge some COMDATs in non-standard ways (the behaviour it at least documented). I'm not sure if OP's solution would be affected, but the one I gave [here](https://stackoverflow.com/questions/41868077/is-it-safe-to-use-the-address-of-a-static-local-variable-within-a-function-templ#comment70922303_41868897) shouldn't be, as the variable `id` is not `const`. – bogdan Jan 26 '17 at 09:34
  • @SergeBallesta After doing a couple of tests, it looks like OP's code will be safe on MSVC as long as the local static is not `const`. MSVC's non-standard behaviour was discussed [here](https://stackoverflow.com/q/29056890/4326278). That question uses MSVC 2013. In my tests, MSVC 2015 U3 doesn't have the problem (at least in a simple test), but the latest 2017 RC does. Non-const works fine across versions. – bogdan Jan 26 '17 at 13:29
12

This is coherent with standard because C++ use templates and not generics with type erasure like Java so each declared type will have its own function implementation containing a static variable. All those variables are different and as such should have different addresses.

The problem is that their value is never used and worse never changed. I remember that the optimizers can merge string constants. As optimizers do their best to be far more clever than any human programmer, I will be afraid that a too zealous optimizing compiler discover that as those variable values are never changed, they will all keep a 0 value, so why not merge them all to save memory?

I know that because of the as if rule, the compiler is free to do what it wants provided the observable results are the same. And I am not sure that the addresses of static variables that will always share the same value shall be different or not. Maybe someone could confirm what part of the standard actually cares for it?

Current compilers still compile separately program units, so they cannot be sure whether another program unit will use or change the value. So my opinion is that the optimizer will not have enough information to decide to merge the variable, and your pattern is safe.

But as I really do not think that standard protects it, I cannot say whether future versions of C++ builders (compiler + linker) will not invent a global optimizing phase actively searching for unchanged variables that could be merged. More or less the same as they actively search UB to optimize out parts of code... Only common patterns, where not allowing them would break a too large code base are protected of it, and I do not think that yours is common enough.

A rather hacky way to prevent an optimizing phase to merge variables having same value would just be to give each one a different value:

int unique_val() {
    static int cur = 0;  // normally useless but more readable
    return cur++;
}
template <typename T>
void * type_id() {
    static int x = unique_val();
    return &x;
}

Ok, this does not even try to be thread safe, but it not a problem here: the values will never be used per themselves. But you now have different variables having static duration (per 14.8.2 of standard as said by @StoryTeller), that except in race conditions have different values. As they are odr used they must have different addresses and you should be protected for future improvement of optimizing compilers...

Note: I think that as the value will not be used, returning a void * sounds cleaner...


Just an addition stolen from a comment from @bogdan. MSVC is known to have very aggressive optimization with the /OPT:ICF flag. The discussion suggest that is should not be conformant, and that it only applies to variable marked as const. But it enforces my opinion that even if OP's code seems conformant, I would not dare to use it without additional precautions in production code.

Serge Ballesta
  • 143,923
  • 11
  • 122
  • 252
  • 2
    What if its declared static volatile? Will that keep the optimizer at bay? – hookenz Jan 26 '17 at 09:44
  • why not declaring a static point of type T? in that case the variables will be of a different types, so it should prevent cleaver optimization – Alessandro Teruzzi Jan 26 '17 at 09:53
  • @AlessandroTeruzzi A constant value of 0 for an int, long, char and their unsigned equivalent could all share same address... – Serge Ballesta Jan 26 '17 at 09:57
  • You could use a type like `template struct dummy`. Presumably the compiler wouldn't be able to optimize those instances away. Then return a `void*`, assuming there isn't any problem with using `==`, `<` and `hash` on `void*` pointers that point to different types of objects. – Joseph Thomson Jan 26 '17 at 10:15
  • @JosephThomson: AFAIK, `<` on pointers to objects not being members of a same array, or bytes not belonging to same object is UB... Why do you need that? But it could be a different question... – Serge Ballesta Jan 26 '17 at 10:34
  • I meant `less` not `<`. My question is whether it is okay to compare and hash `void*` in this way. And whether this would stop the opimizer. – Joseph Thomson Jan 26 '17 at 10:47
  • 2
    `uintptr_t x = (uintptr_t)&x;` merge THAT, you smarty pants compiler! – n. m. could be an AI Jan 26 '17 at 11:00
  • 3
    "And I am not sure that the addresses of static variables that will always share the same value shall be different or not." – "Two objects that are not bit-fields may have the same address if one is a subobject of the other, or if at least one is a base class subobject of zero size and they are of different types; otherwise, they shall have distinct addresses." (§1.8/6 in N4140). No point in wasting time to protect against potential future insanity IMO. – Arne Vogel Jan 26 '17 at 12:37
  • FWIW the comparison with generics is not very relevant: In C# (and other .NET languages), a static member of a generic type [will have an instance per reification](http://stackoverflow.com/a/28235563/3764814), just like with templates. In Java, OTOH, you'll have only one instance of a static field in a generic class, due to type erasure (you can't even specify the type parameters when accessing a static field of a generic class). So it very much depends on the way the language implements generics, not on the mechanism itself. – Lucas Trzesniewski Jan 26 '17 at 12:54
  • @ArneVogel: The note 4) in same paragraph says *Under the “as-if” rule an implementation is allowed to store two objects at the same machine address or not store an object at all if the program cannot observe the difference*. – Serge Ballesta Jan 26 '17 at 13:42
  • @LucasTrzesniewski: I was thinking about Java generics. I have made it more explicit. – Serge Ballesta Jan 26 '17 at 13:44
  • @Serge Ballesta: If my well-formed program cannot observe the difference, why would I care? The hack you propose may actually introduce UB in a multi-threaded program, as you yourself point out, which is IMO a worse can of worms. Obviously an atomic would solve that issue, but then again, I'm not convinced you're trying to solve a problem that currently exists… By the way, implementations are allowed to merge string _literals_… "string constants" is an ambiguous term here. They are certainly not allowed to merge objects of type `const char[]`. – Arne Vogel Jan 26 '17 at 14:07
  • @n.m I think casting to `uintptr_t` may not be guaranteed to work, since pointers may have multiple integer representations (http://en.cppreference.com/w/cpp/language/reinterpret_cast). – Joseph Thomson Jan 26 '17 at 14:59
  • @SergeBallesta: Well of course the *as-if* rule applies - but if the program is going to compare the address of the two objects and behave differently if they are the same, then the program *can* observe the difference, and as-if does not apply. That's just saying that *if you don't take the address of the objects*, they can overlap (or if you do, that you don't compare these addresses). – Martin Bonner supports Monica Jan 26 '17 at 15:21
  • @JosephThomson sorry it's a joke. different objects are guaranteed to have distinct addresses (unless subobjects are involved, which are not present here). – n. m. could be an AI Jan 26 '17 at 16:09
  • @JosephThomson : in your link from cppreference, I read : "2) Any pointer can be converted to any integral type large enough to hold the value of the pointer (e.g. to std::uintptr_t)" – galinette Jan 27 '17 at 13:43
  • @galinette Yes, but it doesn't specify what the result of the conversion will be. It says that a pointer may have multiple integer representations: "the round-trip conversion in the opposite direction is not guaranteed; the same pointer may have multiple integer representations". This would mean that the `uintptr_t` may not necessarily be a unique identifier. – Joseph Thomson Jan 27 '17 at 15:07
  • @JosephThomson : you misread the whole sentence. "A pointer converted to an integer of sufficient size and back to the same pointer type is guaranteed to have its original value, otherwise the resulting pointer cannot be dereferenced safely (the round-trip conversion in the opposite direction is not guaranteed; the same pointer may have multiple integer representations)" Means if the integer is large enough, the round-trip is guaranteed. If the integer is not large enough, it is not garanteed. uintptr_t is there exactly for that purpose since size is guaranteed if this type is defined. – galinette Jan 27 '17 at 15:29
  • @galinette I don't think so. It specifically talks about the round trip conversion from integer to pointer to integer not being guaranteed, since pointers may have multiple integer representations. I think this is an interpretation of the standard where it says, "The mapping function [from pointer to integer] is implementation-defined." – Joseph Thomson Jan 28 '17 at 03:52
  • "A pointer converted to an integer of sufficient size and back to the same pointer type is guaranteed to have its original value". Again I am 100% sure of this, and this is the very reason for uintptr_t existence. If a compiler cannot guarantee the pointer<>integer conversion, it must not declare this type. This is why the type is optional. – galinette Jan 28 '17 at 09:30
  • @galinette All that appears to be guaranteed is that casting from pointer to integer (of sufficient size) and back to pointer again will give you the original pointer. The standard doesn't specify what the result of the pointer to integer conversion will be though, so it is technically not safe to assume that casting a pointer to an integer will always give you the same value. The pointer may have multiple integer representations. Whether or not this happens is practice is another question. – Joseph Thomson Jan 29 '17 at 03:33
6

Post-comment edit : I did not realize at first read that the address was used as the key, not the int value. That's a clever approach, but it suffers IMHO a major flaw : the intent is very unclear if someone else finds that code.

It looks like an old C hack. It's clever, efficient, but the code does not self-explain at all what the intent is. Which in modern c++, imho, is bad. Write code for programmers, not for compilers. Unless you have proven that there is a serious bottleneck which requires bare metal optimization.

I would say it should work but I'm clearly not a language lawyer...

An elegant, but complex constexpr solution, may be found here or here

Original answer

It is "safe" in the sense that this is valid c++ and you can access the returned pointer in all your program, as the static local will be initialized at first function call. There will be one static variable per type T used in your code.

But :

  • Why returning a non const pointer? This will allow callers to change the static variable value, which is clearly not something you would like
  • If returning a const pointer, I see no interest in not returning by value instead of returning the pointer

Also, this approach for getting a type id will only work at compile time, not at run time with polymorphic objects. So it will never return the derived class type from a base reference or pointer.

How will you initialize the static int values? Here you do not initialize them so this is not valid. Maybe you wanted to use the non const pointer for initializing them somewhere?

There are two better possibilities:

1)Specialize the template for all the types you want to support

template <typename T>
int type_id() {
    static const int id = typeInitCounter++;
    return id;
}

template <>
int type_id<char>() {
    static const int id = 0;
    return id;  //or : return 0
}

template <>
int type_id<unsigned int>() {
    static const int id = 1;
    return id;  //or : return 1
}

//etc...

2)Use a global counter

std::atomic<int> typeInitCounter = 0;

template <typename T>
int type_id() {
    static const int id = typeInitCounter++;
    return id;
}

This last approach is IMHO better because you don't have to manage types. And as pointed out by A.S.H, zero-based incremented counter allows using a vector instead of a map which is much more simple and efficient.

Also, use an unordered_map instead of a map for this, you do not need ordering. This gives you O(1) access instead of O(log(n))

Community
  • 1
  • 1
galinette
  • 8,896
  • 2
  • 36
  • 87
  • 1
    Your edit covers my earlier comment, so I deleted that. That said: "How will you initialize the static int values? Here you do not initialize them so this is not valid." -- Statics are implicitly initialised to zero. This is guaranteed by the standard. Even if they weren't, it wouldn't be a problem, since the statics aren't accessed. –  Jan 26 '17 at 07:04
  • 2
    It doesn't matter if they change the value of the static. What matters is its address only. You sort of missed the point here. Your single sentence about it being a run-time thing only is however the correct and biggest caveat. – StoryTeller - Unslander Monica Jan 26 '17 at 07:12
  • The previous commenters are correct that it is only the address that matters. Besides, in reality I wrap the `int*` in a `type_index` type, as mentioned in the question. I am really interested in _why_ my solution is/is not sound. What rules in C++ mean that the solution will/will not work? Maybe I should add the `language-lawyer` tag. – Joseph Thomson Jan 26 '17 at 07:26
  • The "better" solution involves one additional object, requires people to remember to make the typeInitCounter atomic and is marginally less performant. Really the original is fine, but it'd be more sensible to return an opaque type or `void*`. – Voo Jan 26 '17 at 07:40
  • Voo, that a subjective matter but with the "better" approach the intent is clear and easy to understand without a single comment, which for me is good. The pointer key approach (which I did not see at all at first answer, I admit) looks like an old C hack. It's clever, efficient, but the code does not self-explain at all what the intent is. Which in modern c++, imho, is bad. – galinette Jan 26 '17 at 08:13
  • FWIW, returning an auto-incremented identifier has an additional advantage IMO. That is, the `handlers` map in the `processor` object can be made of a vector instead of a map, and that's a great boost of performance since that map is mainly used as an indirection tool. – A.S.H Jan 26 '17 at 08:25
  • As an old C programmer, OP's intent was evident for me :-). But your analyzis that *it is not the modern C++ way* confirms my fear: future evolutions of optimizing compilers could not care much for such a pattern... – Serge Ballesta Jan 26 '17 at 09:14
  • @galinette I do like your version using `atomic`. It seems like the "proper" way to do things to me, and I would definitely use it if I wanted the most robust solution. – Joseph Thomson Jan 26 '17 at 15:22
  • To be fair, "clever but not clear" is one of the best reasons to add a comment. `// Maps each distinct template parameter T to a unique address.` clarifies things nicely, for example. – Justin Time - Reinstate Monica Jan 26 '17 at 19:25
6

As mentioned by @StoryTeller, it works just fine at runtime.
It means you can't use it as it follows:

template<int *>
struct S {};

//...

S<type_id<char>()> s;

Moreover, it's not a fixed identifier. Therefore you have no guarantees that char will be bound to the same value through different runnings of your executable.

If you can deal with these limitations, it's just fine.


If you already know the types for which you want a persistent identifier, you can use something like this instead (in C++14):

template<typename T>
struct wrapper {
    using type = T;
    constexpr wrapper(std::size_t N): N{N} {}
    const std::size_t N;
};

template<typename... T>
struct identifier: wrapper<T>... {
    template<std::size_t... I>
    constexpr identifier(std::index_sequence<I...>): wrapper<T>{I}... {}

    template<typename U>
    constexpr std::size_t get() const { return wrapper<U>::N; }
};

template<typename... T>
constexpr identifier<T...> ID = identifier<T...>{std::make_index_sequence<sizeof...(T)>{}};

And creates your identifiers as it follows:

constexpr auto id = ID<int, char>;

You can use those identifiers more or less as you did with your other solution:

handlers[id.get<T>()] = ...

Moreover, you can use them wherever a constant expression is required.
As an example as a template parameter:

template<std::size_t>
struct S {};

// ...

S<id.get<B>()> s{};

In a switch statement:

    switch(value) {
    case id.get<char>():
         // ....
         break;
    case id.get<int>():
        // ...
        break;
    }
}

And so on. Note also that they are persistent through different runnings as long as you don't change the position of a type in the template parameter list of ID.

The main drawback is that you must know all the types for which you need an identifier when you introduce the id variable.

Community
  • 1
  • 1
skypjack
  • 49,335
  • 19
  • 95
  • 187
  • OP's version [can be adapted](http://melpon.org/wandbox/permlink/9F1z6cRglYYwLV07) so that it can be used as a constant expression, with the caveat that it will work nicely as a template argument only from C++17 onwards. – bogdan Jan 26 '17 at 09:15
  • @bogdan Yeah, sure, that's definitely another valid approach. It still has the problem that identifiers change through different executions. Am I wrong? – skypjack Jan 26 '17 at 09:32
  • That's correct, there's no guarantee that they won't change between executions. – bogdan Jan 26 '17 at 09:41
  • @bogdan The code above is what I get out of my mind when I was trying to define something that gives me consistenst identifiers through different executions. I found it's worth it to share the code with the OP, for he has more or less the problems I had at the time. That's all. ;-) – skypjack Jan 26 '17 at 09:44