11

I am developing a (C++) library that uses unordered containers. These require a hasher (usually a specialization of the template structure std::hash) for the types of the elements they store. In my case, those elements are classes that encapsulate string literals, similar to conststr of the example at the bottom of this page. The STL offers an specialization for constant char pointers, which, however, only computes pointers, as explained here, in the 'Notes' section:

There is no specialization for C strings. std::hash<const char*> produces a hash of the value of the pointer (the memory address), it does not examine the contents of any character array.

Although this is very fast (or so I think), it is not guaranteed by the C++ standard whether several equal string literals are stored at the same address, as explained in this question. If they aren't, the first condition of hashers wouldn't be met:

For two parameters k1 and k2 that are equal, std::hash<Key>()(k1) == std::hash<Key>()(k2)

I would like to selectively compute the hash using the provided specialization, if the aforementioned guarantee is given, or some other algorithm otherwise. Although resorting back to asking those who include my headers or build my library to define a particular macro is feasible, an implementation defined one would be preferable.

Is there any macro, in any C++ implementation, but mainly g++ and clang, whose definition guarantees that several equal string literals are stored at the same address?

An example:

#ifdef __GXX_SAME_STRING_LITERALS_SAME_ADDRESS__
const char str1[] = "abc";
const char str2[] = "abc";
assert( str1 == str2 );
#endif
Community
  • 1
  • 1
djsp
  • 2,174
  • 2
  • 19
  • 40
  • 3
    Certainly not, because is not only about "*equal strings stored at the same address*", but multiple strings stored as substrings of a bigger string, etc. For example, given two literals `"world"` and `"hello world"`, the compiler could generate code like `.data: byte STR { h , e , l , l , o , , w , o , r , l , d }` referencing the first as `STR + 6` and the second as `STR`. – Manu343726 Aug 29 '14 at 21:20
  • 7
    Even if string literals are coalesced, two `char[]` variables will not be. `st1 == str2` will never be true. – Igor Tandetnik Aug 29 '14 at 21:20
  • 1
    Can't you use `std::string` for your container ? Or char arrays ? – quantdev Aug 29 '14 at 21:22
  • if you declare the strings globally and always hash using those globals, you'll be fine unless the globals are being compared across module (so,dll,dylib,etc...) boundaries – cppguy Aug 29 '14 at 21:26
  • 1
    @quantdev I chose not to use `std::string` to avoid the allocation, the exception hole, and the copy (they all would be there, right?), and to be a little less dependent upon the STL. As for char arrays, my class stores a `const char *`, as in the `conststr` example linked in the question body. – djsp Aug 29 '14 at 21:26
  • @Manu343726 In the case you explain, `STR` (referencing `"hello world"`) and `STR + 6` (referencing `"world"`) would compare different, just as the mentioned strings. Using the pointers would thus be valid, wouldn't it? If there was another string literal `"world"`, the compiler could store it as `STR + 6` as well, and the thing would continue working. – djsp Aug 29 '14 at 21:39
  • @Manu343726 You don’t seem to be accounting for null terminators in your example. – Luc Danton Aug 31 '14 at 03:36
  • 1
    Possible duplicate of [String Literal address across translation units](https://stackoverflow.com/q/26279628/608639) and [Addresses of two char pointers to different string literals are same](https://stackoverflow.com/q/19088153/608639) – jww Oct 17 '18 at 01:26

3 Answers3

6

Is there any macro, in any C++ implementation, but mainly g++ and clang, whose definition guarantees that several equal string literals are stored at the same address?

Attempt to merge identical constants (string constants and floating-point constants) across compilation units.

This option is the default for optimized compilation if the assembler and linker support it. Use -fno-merge-constants to inhibit this behavior.

Enabled at levels -O, -O2, -O3, -Os.

  • Visual Studio has String Pooling (/GF option : "Eliminate Duplicate Strings")

String pooling allows what were intended as multiple pointers to multiple buffers to be multiple pointers to a single buffer. In the following code, s and t are initialized with the same string. String pooling causes them to point to the same memory:

char *s = "This is a character buffer";
char *t = "This is a character buffer";

Note: although MSDN uses char* strings literals, const char* should be used

  • clang apparently also has the -fmerge-constants option, but I can't find much about it, except in the --help section, so I'm not sure if it really is the equivalent of the gcc's one :

Disallow merging of constants


Anyway, how string literals are stored is implementation dependent (many do store them in the read-only portion of the program).

Rather than building your library on possible implementation-dependent hacks, I can only suggest the usage of std::string instead of C-style strings : they will behave exactly as you expect.

You can construct your std::string in-place in your containers with the emplace() methods :

    std::unordered_set<std::string> my_set;
    my_set.emplace("Hello");
Community
  • 1
  • 1
quantdev
  • 23,517
  • 5
  • 55
  • 88
  • `std::string` brings with it dynamic allocation and exception danger, although it indeed seems to be the only standard solution, aside from reinventing the wheel, i.e., the hashing algorithm, for my classes. – djsp Aug 29 '14 at 22:19
  • @Kalrish : yes, but the dynamic allocation impact will be limited since your strings are known at compile time (i.e. you can allocate all your strings when the application starts). It sills looks the only clean and portable way. I hope this helps. – quantdev Aug 30 '14 at 01:08
  • 1
    char *s = "This is a character buffer"; is no longer valid in C++ (breaking change in C++ 11). It has to be const char *s = "This is a character buffer"; . – user515430 Aug 30 '14 at 15:38
  • @user515430 Sure, but this is a *quote* from MSDN, not a code suggestion. – quantdev Aug 31 '14 at 17:12
  • @quantdev You are responsible for what you post. No? – user515430 Aug 31 '14 at 19:04
  • 1
    @user515430 When you **quote** a text, you don't modify it, this is why I used "quote marks" : ) . Added a note and thx for the point. – quantdev Aug 31 '14 at 19:42
  • @quantdev Happy with your solution now. – user515430 Aug 31 '14 at 20:00
2

Although C++ does not seem to allow for any way that works with string literals, there is an ugly but somewhat workable way around the problem if you don't mind rewriting your string literals as character sequences.

template <typename T, T...values>
struct static_array {
  static constexpr T array[sizeof...(values)] { values... };
};

template <typename T, T...values>
constexpr T static_array<T, values...>::array[];

template <char...values>
using str = static_array<char, values..., '\0'>;

int main() {
  return str<'a','b','c'>::array != str<'a','b','c'>::array;
}

This is required to return zero. The compiler has to ensure that even if multiple translation units instantiate str<'a','b','c'>, those definitions get merged, and you only end up with a single array.

You would need to make sure you don't mix this with string literals, though. Any string literal is guaranteed not to compare equal to any of the template instantiations' arrays.

  • Thanks! Unfortunately, string literals would be _way_ more comfortable to use in my library. I wonder, though, if they could be converted at compile time to character sequences... – djsp Aug 30 '14 at 14:18
  • @Kalrish String literals cannot be used as template arguments, and although string literals can be passed to `constexpr` functions, and array indexing is allowed on string literals in constant expressions, an indexing operation on a `constexpr` function parameter doesn't qualify as a constant expression. The best I can come up with is horribly abusing the preprocessor and forcing a lot of unnecessary template instantiations: `#define CHAR_AT(s, i) ((i) < sizeof(s) ? (s)[i] : '\0')` / `#define STR(s) (sizeof (s) == 1 ? str<>::array : sizeof (s) == 2 ? str::array : ...)` –  Aug 30 '14 at 15:08
  • You'd need to extend that `STR` macro to support the longest string you'll actually be using, and then use `STR("abc")`. I think it's a very bad idea, but it's the only thing I can come up with that allows string literals. –  Aug 30 '14 at 15:09
  • As they won't be modified, only read (for hashing/comparison, mainly), I think I'll go with them as `constexpr` function parameters (they anyway need to be stored somewhere, even if hashed at compile-time). Again, thank you, but that macro really scares me :). – djsp Aug 30 '14 at 15:16
  • @Kalrish : [`boost::mpl::string<>`](http://www.boost.org/libs/mpl/doc/refmanual/string.html) can do that, but it has a smallish size limitation (32 characters by default). – ildjarn Aug 30 '14 at 18:34
  • FYI, your comments are being discussed in [this question](http://stackoverflow.com/q/25917473/1708801) you may wish to add your own input. – Shafik Yaghmour Sep 18 '14 at 17:44
  • @ShafikYaghmour Thanks, I hope I managed to clarify what I meant. –  Sep 18 '14 at 17:58
1

The tacklelib C++11 library have a macro with the tmpl_string class to hold a literal string as a template class instance. The tmpl_string contains a static string with the same content which guarantees the same address for the same template class instance.

https://github.com/andry81/tacklelib/blob/master/include/tacklelib/tackle/tmpl_string.hpp

Tests:

https://github.com/andry81/tacklelib/blob/master/src/tests/unit/test_tmpl_string.cpp

Example:

const auto s = TACKLE_TMPL_STRING(0, "my literl string")

I've used it in another macro to conveniently and consistently extract a literal string begin/end:

#include <tacklelib/tackle/tmpl_string.hpp>
#include <tacklelib/utility/string_identity.hpp>

//...

std::vector<char> xml_arr;

xml_arr.insert(xml_arr.end(), UTILITY_LITERAL_STRING_WITH_BEGINEND_TUPLE("<?xml version='1.0' encoding='UTF-8'?>\n"));

https://github.com/andry81/tacklelib/blob/master/include/tacklelib/utility/string_identity.hpp

Andry
  • 2,273
  • 29
  • 28