17

Is there a way to get the "raw" buffer o a std::string?
I'm thinking of something similar to CString::GetBuffer(). For example, with CString I would do:

CString myPath;  
::GetCurrentDirectory(MAX_PATH+1, myPath.GetBuffer(MAX_PATH));  
myPath.ReleaseBuffer();  

So, does std::string have something similar?

ThinkingStiff
  • 64,767
  • 30
  • 146
  • 239
MikMik
  • 3,426
  • 2
  • 23
  • 41
  • Possible duplicate of http://stackoverflow.com/questions/7765750/can-you-avoid-using-temporary-buffers-when-using-stdstring-to-interact-with-c – Benj Oct 20 '11 at 14:22
  • See also: [Directly write into char* buffer of std::string](https://stackoverflow.com/q/39200665/4561887) – Gabriel Staples May 24 '22 at 19:15
  • Even better, see: [How to convert a `std::string` to `const char*` or `char*`](https://stackoverflow.com/q/347949/4561887) – Gabriel Staples Jun 03 '22 at 06:01

7 Answers7

22

While a bit unorthodox, it's perfectly valid to use std::string as a linear memory buffer, the only caveat is that it isn't supported by the standard until C++11 that is.

std::string s;
char* s_ptr = &s[0]; // get at the buffer

To quote Herb Sutter,

Every std::string implementation I know of is in fact contiguous and null-terminates its buffer. So, although it isn’t formally guaranteed, in practice you can probably get away with calling &str[0] to get a pointer to a contiguous and null-terminated string. (But to be safe, you should still use str.c_str().)

"Probably" is key here. So, while it's not a guarantee, you should be able to rely on the principle that std::string is a linear memory buffer and you should assert facts about this in your test suite, just to be sure.

You can always build your own buffer class but when you're looking to buy, this is what the STL has to offer.

Gabriel Staples
  • 36,492
  • 15
  • 194
  • 265
John Leidegren
  • 59,920
  • 20
  • 131
  • 152
  • 2
    str.c_str() returns a const pointer so surely that is just as, if not more dangerous – paulm Jan 14 '14 at 09:52
  • 1
    I guess the problem here is that it is in the implementation details. Not in anyway suggesting that is good practice, but if you wish to workaround the limitation, you may do so. Even with say the small string optimization in place you're going to obtain a memory location that is safe to write to, given that you respect the bounds of the array. You may not realize that it has returned a pointer to the stack but this is not something you need to know. I cannot see a situation in which the returned pointer actually points to a guard page (that would result in a page fault if you wrote to it). – John Leidegren Jan 14 '14 at 12:07
  • 1
    This should be accepted answer since C++11. See https://www.tomhuang.com/2011/10/24/using-std-string-as-the-output-buffer-in-c-api.html – Alexandr Zarubkin Dec 04 '18 at 18:18
  • There's a second _huge_ caveat!: if you are *writing into* this `std::string` buffer as though it was a `char *`, you must pre-allocate the buffer size with `s.resize(BUFFER_SIZE)`, or else it is undefined behavior to write into that buffer. `s.reserve(BUFFER_SIZE)` does not cut it. See: https://en.cppreference.com/w/cpp/string/basic_string/operator_at: _"If `pos > size()`, the behavior is undefined."_ So, you use `s.resize()` to forcefully allocate value null terminator chars into the buffer first, up to that size. Then, you can write into that buffer like a normal `char *` up to its size. – Gabriel Staples May 24 '22 at 19:11
  • I've [added an answer here](https://stackoverflow.com/a/72485404/4561887) to explain in detail what I said in my last comment. – Gabriel Staples Jun 03 '22 at 06:26
20

Use std::vector<char> if you want a real buffer.

#include <vector>
#include <string>

int main(){
  std::vector<char> buff(MAX_PATH+1);
  ::GetCurrentDirectory(MAX_PATH+1, &buff[0]);
  std::string path(buff.begin(), buff.end());
}

Example on Ideone.

Xeo
  • 129,499
  • 52
  • 291
  • 397
  • The first parameter for GetCurrentDirectory is supposed to be the length of the buffer. Your code initializes the buffer length to MAX_PATH but then states that MAX_PATH+1 characters are available. So your code runs the risk of GetCurrentDirectory writing a NULL character past the end of the vector - not good. – James Johnston Oct 20 '11 at 13:57
  • @James: Thanks, I don't really work with the winapi functions. :) Well, easy to fix... Also, isn't the same problem present in the OP's code? – Xeo Oct 20 '11 at 13:58
  • Yeah you're right; I didn't read his code. But the error exists in OP's code as well. – James Johnston Oct 20 '11 at 14:05
  • Not really, I think. GetCurrentDirectory states that "the buffer length must include room for a terminating null character" and CString::GetBuffer() says "The minimum size of the character buffer in characters. This value does not include space for a null terminator". So I think I got it right. – MikMik Oct 20 '11 at 14:20
  • @Mik: Ok, sorry then, like I said, I don't work with the winapi and CString n stuff. :) – Xeo Oct 20 '11 at 14:22
  • 3
    If you use std::vector as a *buffer* you'll end up initializing every **element** in the vector in the most expensive manner possible. This will dwarf any CPU cost throughout your application. Use `std::unique_ptr` or stack allocation. Don't waste CPU on initializing a buffer if you don't need to. – John Leidegren Apr 07 '13 at 14:24
  • 1
    @John: You're insane. MAX_PATH is only about 256. The cost of initializing such a vector is irrelevant. – Puppy Apr 07 '13 at 14:39
  • 3
    @DeadMG I'm speaking from experience only, run your program through a profiler and check for yourself but you obviously need more than 1 call for it to be a bottleneck. The reason I bring it up is because if you put that in a library routine which you call often, you're going to waste a lot of CPU. Moreover, since you know MAX_PATH at compile-time, this could be stack allocated, the duration of that buffer is in all likelihood going to be very short. The notion that the vector class represents a *real buffer*, in any way, is completely fallacious. – John Leidegren Apr 07 '13 at 15:44
  • 1
    Please take under advisement that `std::vector` does not do anything smart, such as `memset(..., 0, sizeof ...)` which would be fast(er) it does proper initialization, of every element. – John Leidegren Apr 07 '13 at 15:48
  • @JohnLeidegren Ever heard of `vector buffer; buffer.reserve(buffer_size);`? No memset involved. Contiguous memory. No pointers. No fuss. No lies. – rubenvb Apr 07 '13 at 18:12
  • 2
    @JohnLeidegren: The kernel switch to GetCurrentDirectory is probably more expensive. You would need not just more than 1, but a massive number of calls for this to be a bottleneck. Your comment implies that it is likely for it to be a serious problem, whereas infact it is hideously unlikely for it to be a problem. – Puppy Apr 07 '13 at 18:15
  • 1
    I stand corrected. It appears my `vector` buffer as used above invokes undefined behavior. Then I change my opinion to `std::array`. If the buffer can't fit on the stack, it ain't worth the name buffer. – rubenvb Apr 07 '13 at 18:21
  • 1
    @DeadMG I'm not arguing that it is faster/slower than a kernel transition. I'm making a point that, as a buffer, it unnecessarily slow. *My* opinion, is that it's a bad habit. Even small, seemingly insignificant code like this piles up, and eventually does lead to a noticeable performance impact. – John Leidegren Apr 08 '13 at 07:00
3

Not portably, no. The standard does not guarantee that std::strings have an exclusive linear representation in memory (and with the old C++03 standard, even data-structures like ropes are permitted), so the API does not give you access to it. They must be able to change their internal representation to that (in C++03) or give access to their linear representation (if they have one, which is enforced in C++11), but only for reading. You can access this using data() and/or c_str(). Because of that, the interface still supports copy-on-write.

The usual recommendation for working with C-APIs that modify arrays by accessing through pointers is to use an std::vector, which is guaranteed to have a linear memory-representation exactly for this purpose.

To sum this up: if you want to do this portably and if you want your string to end up in an std::string, you have no choice but to copy the result into the string.

ltjax
  • 15,837
  • 3
  • 39
  • 62
  • 1
    which standard do you refer to? c++11 _does_ guarantee that. Again, see [this background discussion](http://stackoverflow.com/questions/7554039/is-stringc-str-no-longer-null-terminated-in-c11/7554172) – sehe Oct 20 '11 at 14:08
  • I kinda depends on how far you take object identity. Yes, the new standard does guarantee that a linear representation of the string (as in char memory-block) exists in memory. No, it does not guarantee that this is this strings' exclusive representation. After all, copy-on-write implementations still match the spec, hence ``data()`` and ``c_str()`` remain const. I'll update my answer to reflect the "exclusivity" of that buffer. – ltjax Oct 20 '11 at 14:30
  • 1
    COW is more or less obsolete with c++0x move semantics; also it is very inconvenient since C++0x multithreading specification; COW is hardly ever a performance benefit in concurrent programming (due to locking required) and has far too many performance surprises. In fact, I remember reading that C++11 would outlaw COW implementations of std::string (but I still can't find the link) – sehe Oct 20 '11 at 14:34
  • I agree that COW implementations are stupid in general, but the specs look like they still allow it. Even though this is off-topic: While COW can be used to emulate move-semantics, it can still be beneficial for long strings that exist in multiple instances - even if just for the memory savings. – ltjax Oct 20 '11 at 14:53
  • @sehe i would say that concurrent programming is the thing that's fairly useless and that COW is the thing that comes in handy for efficient passing around of strings without pointers and references. do async for your i/o, use message queues to talk between processes, and forget about messy locks! – Erik Aronesty Jan 15 '15 at 20:37
  • As far as I understand, COW implementations are prohibited since C++11, no? – Alexandr Zarubkin Dec 04 '18 at 18:29
2

According to this MSDN article, I think this is the best approach for what you want to do using std::wstring directly. Second best is std::unique_ptr<wchar_t[]> and third best is using std::vector<wchar_t>. Feel free to read the article and draw you own conclusions.

// Get the length of the text string
// (Note: +1 to consider the terminating NUL)
const int bufferLength = ::GetWindowTextLength(hWnd) + 1;
// Allocate string of proper size
std::wstring text;
text.resize(bufferLength);
// Get the text of the specified control
// Note that the address of the internal string buffer
// can be obtained with the &text[0] syntax
::GetWindowText(hWnd, &text[0], bufferLength);
// Resize down the string to avoid bogus double-NUL-terminated strings
text.resize(bufferLength - 1);
sam msft
  • 537
  • 6
  • 17
1
 std::string str("Hello world");
 LPCSTR sz = str.c_str();

Keep in mind that sz will be invalidated when str is reallocated or goes out of scope. You could do something like this to decouple from the string:

 std::vector<char> buf(str.begin(), str.end()); // not null terminated
 buf.push_back(0); // null terminated

Or, in oldfashioned C style (note that this will not allow strings with embedded null-characters):

 #include <cstring>

 char* sz = strdup(str.c_str());

 // ... use sz

 free(sz);
sehe
  • 374,641
  • 47
  • 450
  • 633
  • @sehe: I'm not trying to provoke a flame war. I've just read too many posts containing `LPCSTR` and other Windows-isms without a `winapi` tag on this website. – Fred Foo Oct 20 '11 at 14:36
  • @larsmans: AFAICT the tagging `winapi` is not mandatory. Moreover, we are allowed to add it (in fact, I'll start doing so, since now I learned about it) – sehe Oct 20 '11 at 14:40
  • @sehe: ok. No offence taken, I hope? I only realised later how harsh my words might have seemed. Irony is hard to convey on the internet, so I keep finding out :) – Fred Foo Oct 20 '11 at 17:39
  • 1
    Oh and seeing your updated answer: `strdup` is not available on the Windows platform, IIRC; it's called [`_strdup`](http://msdn.microsoft.com/en-us/library/y471khhc%28v=vs.80%29.aspx) there. – Fred Foo Oct 20 '11 at 17:42
1

It has c_str, which on all C++ implementations that I know returns the underlying buffer (but as a const char *, so you can't modify it).

Fred Foo
  • 355,277
  • 75
  • 744
  • 836
  • "The C++ Programming Language, 3rd Ed." says about string::data(): "writes the characters of the string into an array and returns a pointer to that array". And about c_str: "The c_str() function is like data(), except that it adds a 0 (zero) at the end [...]". So they return copies (although implementers decide to return a pointer to the buffer) – MikMik Oct 20 '11 at 13:56
  • 1
    @MikMik: under all known implementations it returns the buffer. Also, with the new C++11 standard, this is required (implicitely) due to the complexity requirements - see [this explanation](http://stackoverflow.com/questions/7554039/is-stringc-str-no-longer-null-terminated-in-c11/7554172#7554172) – sehe Oct 20 '11 at 14:06
  • 1
    I'm no expert, but "under all known implementations it returns the buffer" it's not "it MUST return the buffer". Now, in C++11 it is required? Good to know, but I don't have C++11 yet. Anyway, if I can't write to the buffer, it is of no help to me right now. – MikMik Oct 20 '11 at 14:23
-1

I think you will be frowned upon by the purists of STD cult for doing this. In any case, its much better to not relay on bloated and generic standard library if you want dynamic string type that can be easily passed to low level API functions that will modify its buffer and size at the same time, without any conversions, than you will have to implement it! Its actually very challenging and interesting task to do. For example in my custom txt type I overload this operators:

ui64 operator~() const; // Size operator
uli32 * operator*();    // Size modification operator
ui64 operator!() const; // True Size Operator
txt& operator--();      // Trimm operator

And also this casts:

operator const char *() const;
operator char *();

And as such, i can pass txt type to low level API functions directly, without even calling any .c_str(). I can then also pass the API function it's true size (i.e. size of buffer) and also pointer to internal size variable (operator*()), so that API function can update amount of characters written, thus giving valid string without the need to call stringlength at all!

I tried to mimic basic types with this txt, so it has no public functions at all, all public interface is only via operators. This way my txt fits perfectly with ints and other fundamental types.

ScienceDiscoverer
  • 205
  • 1
  • 3
  • 13
  • My issue with this answer: it's perfectly reasonable to define a new class if the standard library doesn't fir you needs. However, you don't show us the *implementation* of your new class (to avoid the supposed bloat of the standard library version). Instead you tell us about a set of bizarre operator overloads that make this class behave like no other class in the world (negation for "size"!?). – DavidW Aug 12 '22 at 07:14