8

I should preface this question by saying I think the answer is probably no, but I'd like to see what other people think about the issue.

I spend most of my time writing C++ that interacts with the Win32 API which like most C style APIs wants to either:

  1. Take buffers which I've provided and operate on them.
  2. Or return pointers to buffers which I need to later free.

Both of these scenarios essentially mean that if you want to use std::string in your code you've got to accept the fact that you're going to be doing a lot of string copying every time you construct a std::string from a temporary buffer.

What would be nice would be:

  1. To be able to allow C style APIs to safely directly mutate a std::string and pre-reserve its allocation and set its size in advance (to mitigate scenario 1)
  2. To be able to wrap a std::string around an existing char[] (to mitigate scenario 2)

Is there a nice way to do either of these, or should I just accept that there's an inherent cost in using std::string with old school APIs? It looks like scenario 1 would be particularly tricky because std::string has a short string optimisation whereby its buffer could either be on the stack or the heap depending on its size.

Lightness Races in Orbit
  • 378,754
  • 76
  • 643
  • 1,055
Benj
  • 31,668
  • 17
  • 78
  • 127
  • 6
    @Tomalak Because the question talks about C/C++ interop? – Konrad Rudolph Oct 14 '11 at 09:48
  • 1
    How about using a `std::vector` along with `.data()` for access? – Kerrek SB Oct 14 '11 at 09:50
  • @Konrad: It would equally apply to interop between C++ and C++-without-strings, and he's writing in C++. I don't think C really has much to do with it. – Lightness Races in Orbit Oct 14 '11 at 09:54
  • @KerrekSB `data` provides you with a read-only buffer, but not a C-style string (not necessarily null terminated). `c_str` is the way to go. – Konrad Rudolph Oct 14 '11 at 10:00
  • @KonradRudolph: Well, yes, you have to put some extra work in, but at least you get a managed, dynamic, mutable array... Chances are the C API will terminate your string, too. – Kerrek SB Oct 14 '11 at 10:26
  • @KerrekSB `data` is *not* mutable! It (and its return value) is `const`, just like `c_str`. The standard (§21.3.6.4) is very explicit: “The program shall not alter any of the values stored in the character array.” – Konrad Rudolph Oct 14 '11 at 11:38
  • @Konrad: Kerrek said "how about using `std::vector` along with `.data()`". So he means `vector::data`, not `basic_string::data`. – Steve Jessop Oct 14 '11 at 11:43
  • @Steve Damn. Reading comprehension is such an undervalued skill … – Konrad Rudolph Oct 14 '11 at 12:17

5 Answers5

10

In C++11 you can simply pass a pointer to the first element of the string (&str[0]): its elements are guaranteed to be contiguous.

Previously, you can use .data() or .c_str() but the string is not mutable through these.

Otherwise, yes, you must perform a copy. But I wouldn't worry about this too much until profiling indicates that it's really an issue for you.

Lightness Races in Orbit
  • 378,754
  • 76
  • 643
  • 1,055
  • 1
    But not mutable :-( This is really no different from `c_str()` in previous versions C++. – Kerrek SB Oct 14 '11 at 09:50
  • @Konrad @Kerrek: If you guys are referring to `21.4.5/2`, I think that it's only the referenced value in the `pos >= size()` case that can't be modified. To explicitly have a non-`const` `op[]` and then say that we can't modify that value at all, ever, in our own string, would be ridiculous. – Lightness Races in Orbit Oct 14 '11 at 09:52
  • @tenfour: "Take buffers which I've provided and operate on them." – Lightness Races in Orbit Oct 14 '11 at 09:53
  • 4
    Why was this downvoted? You can of course write to `&str[0]` or a pointer to it. `operator[]` is overloaded both as const and non-const. `reserve` allows you to do just what its name says, and `resize` lets you truncate the length to the number of characters the C API function reported having written afterwards. Further the allocated memory is guaranteed to be contiguous. Why should this not work? – Damon Oct 14 '11 at 09:57
  • I take my comment back. Just take care to reserve enough elements beforehand. Also, I don’t know if `std::string` can be coerced to set a new length correctly. Can `resize` be used (I don’t see a reason why not, but I’m unsure). – Konrad Rudolph Oct 14 '11 at 10:01
  • It seems to me that if you have to resize the string before you can pass it to a C function (instead of just doing reserve to reserve space) that is unlikely to be any more efficient than just passing a temporary vector and copying it into a string afterwards.... Probably demonstrates your intent more clearly though which is a good thing. – jcoder Oct 14 '11 at 10:26
  • @TomalakGeret'kal: Thanks for starting that question (and see my comment there). It would be really nice if one could use `std::string` like a magic version of `std::vector` that always has a null at the end, but somehow I don't think we're going to be that lucky. – Kerrek SB Oct 14 '11 at 10:30
  • @KerrekSB: Um, that's exactly how `std::string` is. – Lightness Races in Orbit Oct 14 '11 at 10:33
  • @TomalakGeret'kal: Well, not if we can't modify the elements though the `data()` pointer! – Kerrek SB Oct 14 '11 at 10:34
  • @KerrekSB: Why? What's the link? And we couldn't modify them through `data()` in C++03, either, when modifying the result of non-`const` `op[]` was unambiguously OK. – Lightness Races in Orbit Oct 14 '11 at 10:35
  • @TomalakGeret'kal: I'm not saying anything deep -- just that you can pass a vector-data to a C API that wants a mutable array, and you can't do that with a string. So string isn't a magic substitute for vector -- it's just different (more restrictive w.r.t. mutability, but a richer interface). – Kerrek SB Oct 14 '11 at 10:37
  • @Kerrek: From that other thread, it seems you're just about the only one who believes that ;) – Lightness Races in Orbit Oct 14 '11 at 10:40
  • 2
    @Tomalak: a slight concern - `str[str.size()]` returns a reference to a 0 terminator, but is it guaranteed that for a non-empty string, `(&str[str.size()-1])+1` points to a 0 terminator? The text of the contiguity requirement for strings at 21.4.1/5 (it says ` – Steve Jessop Oct 14 '11 at 11:38
  • @JohnB - With regard to your point about the resize() making it no more efficient than doing the copy. I don't think this is true, my implementation doesn't realloc, it just truncates the string to the size specified, this is going to be more efficient than potentially doing two allocations (one for the temporary and one for the std::string). – Benj Oct 14 '11 at 13:17
  • @Benj: argharghargh rampant comma abuse! – Lightness Races in Orbit Oct 14 '11 at 13:21
  • 2
    @Tomalak the OCD is strong with this one. ;-) – Benj Oct 14 '11 at 13:24
  • @Benj: Yes! Stop making it hurt! – Lightness Races in Orbit Oct 14 '11 at 13:25
  • 1
    I'm not sure what of you speak, is grammar English not my good? ;-) – Benj Oct 14 '11 at 13:27
  • @Benj I meant making the string big enough to pass to the API. I was thinking it would have to call the constructor for each elemenent, which would be as slow as copying later.... However I was thinking wrong, that wouldn't really apply for char elements. – jcoder Oct 14 '11 at 13:36
0

Since C++11, you don't have to use temporary buffers. You can interchangeably use strings & c-strings and even write to the buffer of c++ strings, but you need to use string::front(), not string::data() or string::c_str() as those only return const char*. See Directly write into char* buffer of std::string.

mo FEAR
  • 552
  • 4
  • 8
0

You may be able to use a std::vector<char> instead. You can directly pass a pointer to the first character into C code and let the C code write it which you can't do with a string. And many of the operations you'd want to perform on a string you can do on a std::vector<char> just as well.

Lightness Races in Orbit
  • 378,754
  • 76
  • 643
  • 1,055
jcoder
  • 29,554
  • 19
  • 87
  • 130
  • I do use this method for non-string related buffers and it works well. However, it kind of sucks to use this for strings since you then can't use any of strings methods on the buffer which nullifys the point of doing it. – Benj Oct 14 '11 at 09:55
  • Almost 3 three years and you never bothered to read the formatting help available right above the edit box? Never wondered how people get those code sections? – Lightness Races in Orbit Oct 14 '11 at 09:59
  • Sorry, yes I suck :P (And have entered code sections plenty of times before) I was entering the reply from my phone, couldn't get it to work, and couldn't see the help section. Plus i'm supposed to be working :P WHich is no excuse, I'll agree. – jcoder Oct 14 '11 at 10:06
  • @Benj Many string functions can be done in alternate ways, for example using std::find instead of a find member. But yes, it's not a good solution in many cases. In C++03 it can be the best one sometimes though – jcoder Oct 14 '11 at 10:07
0

I think the only thing that you can do safely with std::(w)string here is pass it as an input that's not going to be modified by its user; use .c_str() to get a pointer to (W)CHAR.

Lightness Races in Orbit
  • 378,754
  • 76
  • 643
  • 1,055
Alexey Frunze
  • 61,140
  • 12
  • 83
  • 180
  • Yes, I suspect you're correct but I'm hoping someone comes up with a genius solution ;-) – Benj Oct 14 '11 at 09:56
0

Well you could probably just const_cast the .data() of a string to char* and it would most likely work. As with all optimisations, make sure that it is actually this bit of the code that is the bottleneck. If it is, wrap this up in an inline-able function, or a template class or something so that you can write some tests for it and change the behaviour if it doesn't work on some platform.

Ayjay
  • 3,413
  • 15
  • 20
  • 1
    This *probably* works but it’s *definitely* unportable from a standards point of view. Tomalak’s solution is the only legal solution but it only works in C++11. – Konrad Rudolph Oct 14 '11 at 10:04
  • I would definitely avoid this if it is at all possible, but I do wonder - what platforms don't lay out strings sequentially? – Ayjay Oct 14 '11 at 10:34
  • 1
    None of any importance. Herb Sutter reported a straw poll at a C++0x meeting, where they discussed adding the contiguity requirement to `std::string`, and nobody could think of an implementation actually in production, that didn't already use contiguous strings. The reason for doing the poll was to allow them to add the requirement without agonising over whether it removed any optimization opportunities that anyone was using, or otherwise created an implementation burden. So unless you use an implementation that those present don't know about (possible but unlikely), you'd be OK. – Steve Jessop Oct 14 '11 at 11:29
  • 2
    And of course it's perfectly reasonable for any particular implementation to guarantee that what you want to do, works. I don't know whether any of them do that explicitly, but you can even "guarantee" it all by yourself for a particular version, by examining the source. You just can't write code that's formally portable that way. – Steve Jessop Oct 14 '11 at 11:30