1

I need to map component IDs to components in a C++ game engine. Those component IDs should be strings. I cannot use std::string because I need extremely high performance performance as high as possible without hurting ease of development too much, and the conversion from const char* to std::string every time I look up an entry is way too costly.

According to this answer, using const char* can lead to comparison issues. What is the right way to use C-style strings as map keys, without those comparison issues?

Community
  • 1
  • 1
Yohaï-Eliel Berreby
  • 1,165
  • 3
  • 14
  • 25
  • 3
    How do you know that conversion from `const char*` to `std::string` is going to kill your "extremely high performance"? I hope the answer is "I profiled it, and the profiler clearly identified the conversion as the bottleneck where 10% or more of CPU cycles are spent" ;-) – Sergey Kalinichenko Jul 16 '14 at 12:51
  • 2
    First of all if you need *extremely high performance* then you should not use strings at all. You may still use strings as an interface to your users, but when you pack data down to be loaded fast, you should use some other unique identifier, for instance a sufficiently large integer. If you still decide you want to use c-strings then the answer is to supply your map (which should probably be an unordered_map, compare them). You need to provide your own comparator (+ hash if you use unordered map.) (See πάντα ῥεῖ's answer for details on that.) – Stian Svedenborg Jul 16 '14 at 12:53
  • @dasblinkenlight In fact, someone else already profiled it, and the results speak for themselves: http://stackoverflow.com/questions/12136071/c-mapstdstring-vs-mapchar-performance-i-know-again?rq=1 – Yohaï-Eliel Berreby Jul 16 '14 at 12:56
  • 1
    @juanchopanza Voted to reopen, because the duplicate's answers don't tell anything about the performance aspect asked for. – πάντα ῥεῖ Jul 16 '14 at 13:00
  • @StianV.Svedenborg I need high performance, but I also need strings because I'm loading data from JSON files. Sometimes you can trade a little CPU time for more ease of development. Using std::string wouldn't ease my development much (I don't need to do fancy operations of keys), but I would noticeably slow my program (thousands, or tens of thousands of lookups per second). – Yohaï-Eliel Berreby Jul 16 '14 at 13:00
  • 1
    @filsmick Let me ask, are you reading JSON files 10000X per second or are you accessing objects created from JSON files 10000X/s? – Stian Svedenborg Jul 16 '14 at 13:07
  • @StianV.Svedenborg I'm reading them once per object creation, then using the returned data lots of times per second. Reading them that often would be plain stupid, since the data won't change during the execution of the program. But there *is* room for optimisation (that is, reading them at the start of the program then storing the result in memory), thanks for pointing it out. – Yohaï-Eliel Berreby Jul 16 '14 at 13:08
  • @filsmick Hehe, (I was more wondering if that was your usecase), ok because then I'd say my suggestion still stands. I suggest the following. When you load your JSON files, place the values you need to lookup in a std::vector and use the index in the vector as your identifier. (i.e. that is what you store in your objects.) Also, create a map to/from the string identifiers to this index. Now, when you need to lookup a value, you use the vector in stead of the map, and you still have your identifier strings available in the two other maps. – Stian Svedenborg Jul 16 '14 at 13:14
  • @StianV.Svedenborg Clever, but that wouldn't do. I will always use string identifiers, so I would never directly use the vector. – Yohaï-Eliel Berreby Jul 16 '14 at 13:19
  • 1
    @πάνταῥεῖ If performance is an issue, then using an `std::map` is the probably wrong approach anyway. – juanchopanza Jul 16 '14 at 13:19
  • @filsmick Then my best suggestion is to use `std::string`s all the way, i.e. to load your strings as std::strings and not C-strings in the first place. Once that is done, **always pass them by reference.** If you do that you will incur next to no performance impact for using std::strings over C-strings. Also, you really want to use the hash-map... – Stian Svedenborg Jul 16 '14 at 13:22
  • @juanchopanza What would you use, then? I want to optimize as much as I can, even if it means a lot of work at the beginning. But I'm not ready to sacrifice that much ease of development for a few CPU cycles. To give you an idea, I need to be able to do that: `actor->getComponent("ComponentName")` – Yohaï-Eliel Berreby Jul 16 '14 at 13:23
  • @StianV.Svedenborg I see what you mean, but I should be able to write the name of the component directly, as a string literal. See my comment above ^ – Yohaï-Eliel Berreby Jul 16 '14 at 13:24
  • Depending on the size of the data, a plain array with linear search could be faster. Or a hash map. You should try different things out. – juanchopanza Jul 16 '14 at 13:25
  • @juanchopanza I don't quite see how I would use an array to look up values based on keys, unless the keys are indexes. If I don't find something better, I'll just use the comparator solution. – Yohaï-Eliel Berreby Jul 16 '14 at 13:26
  • @filsmick They profiled their program, not yours. Even assuming that the lookup is three times slower in *your* program with *your* strings, it says nothing about the proportion of the time spent in the lookup to the time spent in the rest of your code, which is the only thing that really matters when you are optimizing. – Sergey Kalinichenko Jul 16 '14 at 13:30
  • @dasblinkenlight My program does very similar things to theirs, when talking about map lookups. If something is called tens of thousands of times in my code, it *needs* to be fast. Very fast. I know well the quote "Premature optimisation is the root of all evil", but in games, performance really matters. – Yohaï-Eliel Berreby Jul 16 '14 at 13:32
  • Disagree with the votes to reopen. While @filsmick may have a good question, it's not this one. Also, he doesn't exactly have good code when his **performance-critical** code is doing `actor->getComponent("ComponentName")` – MSalters Jul 16 '14 at 14:23
  • @MSalters May I ask why this code isn't good? I don't want to directly access the map to preserve encapsulation, and the getComponent method is short and will probably be inlined (note, people will probably complain about inline breaking encapsulation. For me it doesn't, since the user of the class doesn't have to know about the mechanisms of `getComponent()`). If you are talking about using a string as a key, I discussed the reasons for this above. – Yohaï-Eliel Berreby Jul 16 '14 at 14:35
  • I've looked into is, trying to place a c-string in an unorderd_map safely gets rather hairy fast, as you need to rewrite hash, equals and an allocator (the allocator is the hairy part). It should be possible, but I feel you probably could do better using something else. Have you considered simply replacing the strings with enums or global constants? – Stian Svedenborg Jul 16 '14 at 15:20
  • @StianV.Svedenborg Yes, but it would be tricky to initially map the string read from the JSON file to a constant / enum. And using constants / enums would force me to recompile every file using the actor-component system every time I add a component. No, I'll use the comparator, and *if it doesn't work well*, look at another solution. – Yohaï-Eliel Berreby Jul 16 '14 at 17:57
  • @filsmick Feel free to let me know how it works out, I believe you will hit the same allocator-problem as for the hash-map, but if your compiler is optimizing away const char pointers... Maybe... – Stian Svedenborg Jul 16 '14 at 22:01
  • @filsmick: You never came up with a **sound technical reason** to use string literals as keys. Even though hashtables are the logical solution, repeated runtime hashing of compile time constants is just insane in high-performance code. – MSalters Jul 17 '14 at 07:01
  • @MSalters Okay, here's how it works. I have a big JSON file containing the component names for specific types of actors, and the data with which to initialize these components (eg., initial coordinates for a Position component). I also have three classes: ActorFactory (singleton), Actor and Component. When the game is run, the ActorFactory constructor loads the big JSON file and parses it. The parsed data is kept alive in memory as a member variable. When you want to create an actor, you call ActorFactory::createActor(const char*), passing it the name of the kind of actor you want – Yohaï-Eliel Berreby Jul 17 '14 at 09:45
  • [continued]. This method will search the JSON file for the name you passed it. If it finds it, it creates a shared_ptr. For each component the actor has, it calls a component creator, a function pointer stored in a map with string keys. It *needs* to use string keys, even if it is in an indirect way. If it didn't, I would have to map strings to compile-time constants. The component creator gets the JSON data of a specific component and initializes a component with it. The component is added to the map we were talking about at the beginning, so that we can access it by name later. – Yohaï-Eliel Berreby Jul 17 '14 at 09:54
  • So I *can* use compile-time constants. It will save me nothing with the component creators map, but maybe will help a bit with the component map of the actors. I'll certainly use string literals in debug stage and constants in release. – Yohaï-Eliel Berreby Jul 17 '14 at 09:57

1 Answers1

2

You can provide your own Comparator class as template parameter for std::map that implements the comparison based on the c-style strings (e.g. using std::strcmp)

If you have fixed string values to compare with, and a fixed number of them, you'll probably be better off calculating hash values for them, and use these as keys in the map. This could speed up searching a lot, though you have to calculate the hash value for a search input.

πάντα ῥεῖ
  • 1
  • 13
  • 116
  • 190
  • Generally, supplying comparator alone is not going to be sufficient (although it might be sufficient in many practical cases). – Sergey Kalinichenko Jul 16 '14 at 12:52
  • @dasblinkenlight In which cases wouldn't it be sufficient? – Yohaï-Eliel Berreby Jul 16 '14 at 13:02
  • @filsmick I think dasblinkenlight bothered about the performance aspects, those will be more or less the same no matter if `strcmp()` or `std::string` is used. If performance is important for you, you should go with the calculated hash idea. – πάντα ῥεῖ Jul 16 '14 at 13:05
  • @πάνταῥεῖ `strcmp()` will still be more efficient because there won't be all those `const char*` -> `string` conversions. I'll look at the hash solution if the comparator doesn't work well. – Yohaï-Eliel Berreby Jul 16 '14 at 13:06