0

Using MS Visual Studio 17.2.5, I declared std::string vc = "test";. In a debug session, I obtained the address of this string via &vc - e.g. 0x008FF8C4, but if I put this address into the Debug Memory window then I see that my string is stored with an offset from the obtained address, like this:

0x008FF8C4  *30 9a b1 00* **74 65 73 74 00** cc cc cc cc cc cc cc cc cc cc cc

String representation of this sequence is 0љ±.test. This means that the string ("test" is encoded to 74 65 73 74 00 sequence) is stored somehow with a bit offset from initially obtained address of her location. I expected that proper address would be 0x008FF8C8 (+4) that points me exactly to this string. I can't figure out why the address of variable is wrong? Please advise what "30 9a b1 00" means.

JaMiT
  • 14,422
  • 4
  • 15
  • 31
reader
  • 11
  • Does this answer your question? [What are the mechanics of short string optimization in libc++?](https://stackoverflow.com/questions/21694302/what-are-the-mechanics-of-short-string-optimization-in-libc) This is another tool-chain's implementation details but design concepts also apply to MSVC. – Richard Critten Aug 21 '22 at 20:29
  • @RichardCritten Not sure the implementation of SSO in MSVC is the same as in LLVM. – bitmask Aug 21 '22 at 20:31
  • @bitmask was just adding that. I think the concepts are similar though. – Richard Critten Aug 21 '22 at 20:31
  • Relevant, but not asking the same question: [std::string vs. char*](https://stackoverflow.com/questions/2672346/) – JaMiT Aug 21 '22 at 21:31
  • For another experiment, initialize your string to a longer string literal, let's say 36 characters (could be 26 letters plus 10 digits, if you don't want to count them). Do you see the string anywhere when you do the same debugging process? – JaMiT Aug 21 '22 at 21:33

1 Answers1

3

std::string is a management object for text... if the text is long, it will be dynamically allocated on the heap, but if it's short it'll be embedded inside the std::string management object, to save the runtime overheads of dynamic memory allocation and deallocation, as well as hitting additional memory cache-lines. This is known as a Short String Optimisation.

You're seeing where in the management object the text gets embedded. Clearly to differentiate cases where the string is embedded from those where the text is on the heap (and the management object holds pointers to it), there needs to be some data in the management object that has correspondingly different values. That's likely what occupies the first few bytes of your management object.

Note that if you used static constexpr char vc[] = "test"; you'd find just the "test\0" text.

Tony Delroy
  • 102,968
  • 15
  • 177
  • 252