31

I know that in C++03, technically the std::basic_string template is not required to have contiguous memory. However, I'm curious how many implementations exist for modern compilers that actually take advantage of this freedom. For example, if one wants to use basic_string to receive the results of some C API (like the example below), it seems silly to allocate a vector just to turn it into a string immediately.

Example:

DWORD valueLength = 0;
DWORD type;
LONG errorCheck = RegQueryValueExW(
        hWin32,
        value.c_str(),
        NULL,
        &type,
        NULL,
        &valueLength);

if (errorCheck != ERROR_SUCCESS)
    WindowsApiException::Throw(errorCheck);
else if (valueLength == 0)
    return std::wstring();

std::wstring buffer;
do
{
    buffer.resize(valueLength/sizeof(wchar_t));
    errorCheck = RegQueryValueExW(
            hWin32,
            value.c_str(),
            NULL,
            &type,
            &buffer[0],
            &valueLength);
} while (errorCheck == ERROR_MORE_DATA);

if (errorCheck != ERROR_SUCCESS)
    WindowsApiException::Throw(errorCheck);

return buffer;

I know code like this might slightly reduce portability because it implies that std::wstring is contiguous -- but I'm wondering just how unportable that makes this code. Put another way, how may compilers actually take advantage of the freedom having noncontiguous memory allows?


EDIT: I updated this question to mention C++03. Readers should note that when targeting C++11, the standard now requires that basic_string be contiguous, so the above question is a non issue when targeting that standard.

Billy ONeal
  • 104,103
  • 58
  • 317
  • 552
  • Unless you're certain that MSVC is successfully giving you the RVO (even though you have two different returns, one a temporary and one a variable name), then you're not "allowed" to worry about an extra copy ;-) – Steve Jessop Feb 13 '10 at 02:08
  • 1
    I don't believe RVO would optimize a copy between vector and string.... – Billy ONeal Feb 13 '10 at 02:14
  • What I mean is that if the current code has no RVO then it's "create string. Copy it to the return value". You're talking maybe 50% more copying if you change that to "create vector. Copy it to string. Copy it to return value". Or maybe no extra copying at all if you do `return std::wstring(vec.begin(), vec.end());` and get "create vector. Copy it to return value (via RVO)". I'd worry about whether I could detect the speed difference before I worried about how portable the resulting code was. But that's just this example, which is why it's a comment not an answer. – Steve Jessop Feb 13 '10 at 02:31
  • If you rely on undefined behavior, please comment it. That way the bug it causes when you port the code, or when the underlying implementation changes, can be found later. Even if the chance of that approaches zero, do it anyway. Better to spend five slightly annoying seconds now than multiple painful hours later. Also, this sounds like pre-mature optimization at the cost of correctness. – Merlyn Morgan-Graham Feb 13 '10 at 03:01
  • Agreed on the commenting thing. I'm not sure about pre-mature optimization -- it seems to cut the running time for the method is cut in half without the extra copy. 50% improvement does not sound like pre-mature optimization to me. – Billy ONeal Feb 13 '10 at 03:12
  • Depends what proportion of your application's time is spent reading registry values. 50% of nothing is nothing. 50% of half an hour is a coffee break. – Steve Jessop Feb 13 '10 at 03:27
  • Perhaps. But when I've seen the phrase "pre-mature optimization" I usually associate it with doing things like declaring variables outside of FOR loops or using ++x instead of x++ -- namely cases where the compiler typically does that kind of thing for you. This particular app does nothing but read registry values and print them out; therefore this kind of a change significantly improves this program's performance. – Billy ONeal Feb 13 '10 at 05:43
  • Oh -- and this example was one I yanked out of code I happened to be working on at the time -- I have an interface like this for `std::wstring` for almost every win32 api call in my program. – Billy ONeal Feb 13 '10 at 05:52
  • 1
    Premature is optimising before you have established that (a) the existing program is too slow, and (b) this bit of code is responsible for a significant part of the time. Since you've proved that it speeds up your app it's not premature, and IMO you're right to investigate whether it's safe. If you had done it on the basis that it speeds up this one function, not measuring whether that function is responsible for 90% of your runtime or 0.001%, then it would be premature. I'd naively guess that finding values in the registry is way slower than copying them, but apparently not. – Steve Jessop Feb 13 '10 at 12:35
  • @BillyONeal: premature optimization is when you optimize code without knowing if it **matters**. If the function is called once per second, and takes two nanoseconds to execute, then even your 50% improvement is completely and utterly unmeasurable and a complete waste of time. The metric that matters is not "how many percent of its original running time does it take now", but "how many percent of the **application's** running time is spent in the function now" – jalf Apr 02 '10 at 01:54
  • @Jalf: In this application's case, it does matter. More to the point, this pattern applies to most any other Win32API function that returns a string, not just registry functions. – Billy ONeal Apr 02 '10 at 02:19
  • @BillyONeal: Ok, but then that is your argument for performing the optimization. My point was simply that your earlier comment of "50% improvement does not sound like pre-mature optimization to me" is mistaken. It can still be premature, even if it reduces the running time of the function by 95%. – jalf Apr 02 '10 at 22:51

5 Answers5

25

I'd consider it quite safe to assume that std::string allocates its storage contiguously.

At the present time, all known implementations of std::string allocate space contiguously.

Moreover, the current draft of C++ 0x (N3000) [Edit: Warning, direct link to large PDF] requires that the space be allocated contiguously (§21.4.1/5):

The char-like objects in a basic_string object shall be stored contiguously. That is, for any basic_string object s, the identity &*(s.begin() + n) == &*s.begin() + n shall hold for all values of n such that 0 <= n < s.size().

As such, the chances of a current or future implementation of std::string using non-contiguous storage are essentially nil.

Jerry Coffin
  • 476,176
  • 80
  • 629
  • 1,111
  • 1
    "all known implementations". In particular, all that matters for a WinAPI call is the various versions of Windows. So "all known implementations" might actually be "all implementations". – Steve Jessop Feb 13 '10 at 02:15
  • 6
    @Steve Jessop: Not really, `std::basic_string` is a compiler feature, not a Windows feature. What version of windows the compiled code runs on really doesn't matter here. – Billy ONeal Feb 13 '10 at 02:18
  • 3
    Fair point. You could perhaps say, though, "this code is only supported on Microsoft compilers". Still not strictly the same as Windows versions, but the point is that you only have to worry about a fixed set of implementations. Future MS compilers will support most or all of C++0x. – Steve Jessop Feb 13 '10 at 02:30
  • @Steve:In this case, the "all known implementations" was (at least if I recall correctly) "all implementations known to anybody on the committee." Given that virtually every C++ implementer is represented, it probably does mean all (publicly available) implementations. If you want to get technical, I modified SGI's rope class to conform (or at least get really close) years ago, but I'm pretty no eyes but mine have ever looked at that, so it hardly counts (and I haven't used it or even looked at it in years, so I certainly don't care). – Jerry Coffin Feb 13 '10 at 03:15
  • The quote I remember is Herb Sutter saying this on a blog, and was the result of a straw poll done spontaneously. So nobody present could recall a rope-style std::string. Good enough for most practical purposes, and I guess that nobody planning to support C++0x has complained. Doesn't actually *prove* someone didn't forget something, or arrive late that day, or whatever. Once a rope-style implementation is less likely than an implementation that's non-compliant due to bugs, I guess you can argue it doesn't matter any more. It's just that with a limited scope you can do even better. – Steve Jessop Feb 13 '10 at 03:23
  • 1
    @Steve:Keep in mind, however, the requirement's been there since at least N2284 (05/07/2007), so by now there's been plenty of time for anybody who wasn't there to speak up, but nobody has. Admittedly isn't a proof that it hasn't been done, but does seem like pretty decent evidence that if anybody's using the current leeway, they still think contiguous allocation offers more benefit. – Jerry Coffin Feb 13 '10 at 04:47
14

A while back there was a question about being able to write to the storage for a std::string as if it were an array of characters, and it hinged on whether the contents of a std::string were contiguous:

My answer indicated that according to a couple well regarded sources (Herb Sutter and Matt Austern) the current C++ standard does require std::string to store its data contiguous under certain conditions (once you call str[0] assuming str is a std::string) and that that fact pretty much forces the hand of any implementation.

Basically, if you combine the promises made by string::data() and string::operator[]() you conclude that &str[0] needs to return a contiguous buffer. Therefore Austern suggests that the committee just make that explicit, and apparently that's what'll happen in the 0x standard (or are they calling it the 1x standard now?).

So strictly speaking an implementation doesn't have to implement std::string using contiguous storage, but it has to do so pretty much on demand. And your example code does just that by passing in &buffer[0].

Links:

Community
  • 1
  • 1
Michael Burr
  • 333,147
  • 50
  • 533
  • 760
0

Edit: You want to call &buffer[0], not buffer.data(), because [] returns a non-const reference and does notify the object that its contents can change unexpectedly.


It would be cleaner to do buffer.data(), but you should worry less about contiguous memory than memory shared between structures. string implementations can and do expect to be told when an object is being modified. string::data specifically requires that the program not modify the internal buffer returned.

VERY high chances that some implementation will create one buffer for all strings uninitialized besides having length set to 10 or whatever.

Use a vector or even an array with new[]/delete[]. If you really can't copy the buffer, legally initialize the string to something unique before changing it.

Potatoswatter
  • 134,909
  • 25
  • 265
  • 421
  • That's why I'm calling `std::basic_string::resize` first. Calling resize essentially forces a reallocation of the underlying buffer that the string object is using. See Scott Myers Effective STL Item #16: "Know how to pass Vector and String data to legacy APIs." – Billy ONeal Feb 13 '10 at 19:42
  • @Billy: I saw what you're doing. "Essentially forces" is not "guarantees." From the implementation's perspective, you have a number of objects which *should* contain all zeroes, and it was never given a chance to see whether they do or don't because you never called a non-`const` member function after `resize`. – Potatoswatter Feb 13 '10 at 22:17
  • Umm.. `resize` itself is a non-const member function. Calling resize forces the implementation to allocate and default construct the values in the string -- resize modifies the string itself. Therefore, even in a reference counted implementation, the string must be created from scratch, because the content of the string has changed. I believe you are confusing `resize` with `reserve` here. `Reserve` changes the underlying allocation but not the data, so it's possible that an implementation might share. But `resize` changes both, ergo no sharing. – Billy ONeal Feb 14 '10 at 00:49
  • See example here: http://www.cplusplus.com/reference/string/string/resize/ Note that the string _content_, not just the underlying representation, is changed with the call to `resize`. – Billy ONeal Feb 14 '10 at 00:59
  • @Billy: For a type like `char`, there is no pretense of calling a constructor. The implementation can keep a zeroed-out scratch space to store all small strings `resize`'d from empty. (Not to say it's a great idea.) But otherwise I was completely wrong, answer reversed. (But, calling *both* `data` and `[]` would unambiguously satisfy both requirements of a modifiable contiguous buffer.) – Potatoswatter Feb 14 '10 at 02:04
  • @Potatoswatter: Yes, for a type like char, there is a pretense of calling a constructor. `char()` is the default constructor for a char, and it returns `'\0'`. And the implementation can't keep a separate space like you describe, because `void resize ( size_t n );` calls `resize(n,char())`, which means the function doing the work (`void resize ( size_t n, char c )`) can't make those kinds of assumptions regarding how it is constructed. More importantly, the basic_string type must be general and has to deal with the fact that it's entirely legal to make a std::basic_string. – Billy ONeal Feb 14 '10 at 03:31
  • @Billy: `resize` is implemented by a template which can inspect `char_type` and optimize appropriately. C++ is generally good about allowing assumptions to be made. Moreover, the runtime can sweep all `string` s and collect identical ones to common buffers, except ones to which the user got an element reference. – Potatoswatter Feb 14 '10 at 03:50
  • As for the constructor pretense, see the top of §3.9 for how POD objects are allowed to be constructed—`string` can only hold POD's. – Potatoswatter Feb 14 '10 at 04:02
  • Even POD's with constructors with side-effects could probably be shoehorned into a sneaky optimization scheme if it simply called the destructors at "surprising" times. – Potatoswatter Feb 14 '10 at 04:15
  • Ok. +1 Was assuming `MyClass` was a POD type because you'd somehow have to give it a char_traits tag... – Billy ONeal Feb 14 '10 at 04:28
-1

The result is undefined and I would not do it. The cost of reading into a vector and then converting to a string is trivial in modern c++ heaps. VS the risk that your code will die in windows 9

also, doesnt that need a const_cast on &buffer[0]?

pm100
  • 48,078
  • 23
  • 82
  • 145
  • 2
    String implementations have nothing to do with the windows API and therefore should have nothing to do with what version of windows someone uses. Yes, it's undefined behavior according to the standard. But it's okay for every compiler of which I am aware. I'm curious how many compilers actually take advantage of the latitude the standard gives them. – Billy ONeal Feb 13 '10 at 02:06
  • tyically new versions of windows ship with new version of c runtime. the point is that undefined means that it can change mysteriously in the future, why take the risk? Practically, I have never seem a string impl that doesnt lay the string out as a nice classic array. But I still wouldnt do it – pm100 Feb 13 '10 at 02:09
  • Undefined does NOT mean that it can change mysteriously in the future. Undefined means compilers can implement it however they want. Once the code is compiled, it's behavior cannot change, unless it's calling dynamic libraries. Since string does not call DLLs, future versions of windows will not break it. (Unless I use a dynamic C runtime -- then I suppose it's possible but still unlikely) I'm not asking if it's a good idea to do this -- I'm asking if there are any compilers that care. – Billy ONeal Feb 13 '10 at 02:13
  • `also, doesnt that need a const_cast on &buffer[0]?` <-- No. `std::basic_string::operator[]` returns a non-const reference to the first element in the string. http://cplusplus.com/reference/string/string/operator[]/ – Billy ONeal Feb 13 '10 at 02:20
  • are we having fun yet? :-) All compilers that I have seen do it the way we both expect. OK? But i still wouldnt do it – pm100 Feb 13 '10 at 02:27
  • and the title of the questions ask how bad is it. and my answer is 'quite' – pm100 Feb 13 '10 at 02:28
  • as per jerry coffin, seems like they are going to change the spec. In that case - its fine – pm100 Feb 13 '10 at 02:30
  • I understand that. But my question was not "would you do this" it was "do compilers care". – Billy ONeal Feb 13 '10 at 02:30
  • Another question - is the contiguos memory implied by &buffer[0] guaranteed to be writable. You are only supposed to write to the string through its front door so even though the string is not const there is no assurance that the memory is writable - it almost certainly is but its not forced to be – pm100 Apr 02 '10 at 01:21
  • @pm100: I know this is late, but yes, the memory is required to be writable (it is a non-const reference, therefore it must reference writable memory). To make a non-const reference point to const data would be undefined behavior. – Billy ONeal Jan 04 '11 at 19:43
-2

Of course, allocating a vector here is silly. Using std::wstring here is not wise also. It's better to use a char array to call the winapi. construct a wstring when returning value.

leon
  • 435
  • 1
  • 4
  • 12
  • I made this up as an example -- assuming I'm reading a unicode string value from the registry. Use any Win32 function you like and the question is the same. – Billy ONeal Feb 13 '10 at 02:11