16

I have a function in a library that takes in a char* and modifies the data.

I tried to give it the c_str() but c++ docs say it returns a const char*.

What can I do other than newing a char array and copying it into that?

John Dibling
  • 99,718
  • 31
  • 186
  • 324
jmasterx
  • 52,639
  • 96
  • 311
  • 557

5 Answers5

17

You can use &str[0] or &*str.begin() as long as:

  • you preallocate explicitly all the space needed for the function with resize();
  • the function does not try to exceed the preallocated buffer size (you should pass str.size() as the argument for the buffer size);
  • when the function returns, you explicitly trim the string at the first \0 character you find, otherwise str.size() will return the "preallocated size" instead of the "logical" string size.

Notice: this is guaranteed to work in C++11 (where strings are guaranteed to be contiguous), but not in previous revisions of the standard; still, no implementation of the standard library that I know of ever did implement std::basic_string with noncontiguous storage.

Still, if you want to go safe, use std::vector<char> (guaranteed to be contiguous since C++03); initialize with whatever you want (you can copy its data from a string using the constructor that takes two iterators, adding a null character in the end), resize it as you would do with std::string and copy it back to a string stopping at the first \0 character.

Matteo Italia
  • 123,740
  • 17
  • 206
  • 299
  • 4
    Note: This is only guaranteed in C++11, in C++03 strings could be implemented as ropes (list of discontiguous chunks of memory) – David Rodríguez - dribeas Jun 06 '12 at 15:03
  • 1
    @DavidRodríguez-dribeas: correct in theory, but nobody ever did it; someone implemented ropes, but as a separated class, I suppose to avoid e.g. making `c_str()` calls expensive or to break all the code that supposes that strings are stored in contiguous memory. – Matteo Italia Jun 06 '12 at 15:05
  • Just to be safe, you can also explicitly nul-terminate your `string` by adding a 0 character to the end of it before you pass anything to the C function. That way, there's no concern over what happens if the C function writes to the terminator that it finds. – Steve Jessop Jun 06 '12 at 15:09
  • Actually this is *not* compliant even with C++11: `operator[]`, even the non-const version, requires "the referenced value shall not be modified." It breaks COW. The library could return a shared byte sequence of `n` NUL characters, never allocate a new one, and be totally surprised when you modify it. `&* str.begin()` however does appear kosher. – Potatoswatter Jun 06 '12 at 15:11
  • @Potatoswatter: [citation needed] –  Jun 06 '12 at 15:16
  • 1
    @Fanael C++11 21.4.3 [string.iterators] doesn't mention not modifying; 21.4.5 [string.access] and 21.4.7.1 [string.accessors] both do. – Potatoswatter Jun 06 '12 at 15:21
  • 1
    @Potatoswatter: I think there's some punctuation confusion in `21.4.5/2`. I believe that "the reference shall not be modified" is intended to apply only to the case where `pos == size()`, but that is not clear from the semi-colon. It's confusing because normally in English a comma has higher precedence than a semi-colon. – Steve Jessop Jun 06 '12 at 15:21
  • @SteveJessop that requires putting the semicolon at higher precedence than the comma. And there is a good argument for COW. And that doesn't explain why [string.accessors] explicitly mentions the entire sequence. (I initially just looked up `data` expecting to verify that it returned non-const, but it's just the same as `c_str`.) – Potatoswatter Jun 06 '12 at 15:24
  • @SteveJessop: In N3376, it's very much unambiguous: "*Returns*: `*(begin() + pos)` if `pos < size()`. Otherwise, returns a reference to an object of type `charT` with value `charT()`, where modifying the object leads to undefined behavior." –  Jun 06 '12 at 15:27
  • @Fanael: ah, thanks, I was using the FDIS. Problem solved, no COW in C++11 (or at least if you do implement COW, then you must take your copy when someone calls `operator[]`, not wait for them to actually modify the string). – Steve Jessop Jun 06 '12 at 15:30
  • Oh… I shouldn't be doing this right now. Because `operator[]` calls `begin`, and `begin` notifies any COW mechanism that a change may occur, then `operator[]` does prepare it for a change. – Potatoswatter Jun 06 '12 at 15:30
  • @Potatoswatter: if your interpretation were true, `std::string x = "acbd"; x[0] = 'c';` would be illegal, mind you. –  Jun 06 '12 at 15:30
  • @Fanael Yes, I was thinking COW implementations *would* behave unpredictably in practice, one reason why they're unpopular. (Unlike discontiguous representations, it has been tried.) – Potatoswatter Jun 06 '12 at 15:33
  • @Potatoswatter: I haven't looked into this properly, but I suppose that if you want to take advantage of any possible COW-ness of the implementation then you restrict yourself to `data()` and `c_str()`. Maybe `operator[] const`, depending whether it's guaranteed to return the same address as `operator[] non-const`, and the same question applies to `begin() const`. – Steve Jessop Jun 06 '12 at 15:37
  • @SteveJessop Well there's now `cbegin` for that purpose. But it's probably not worth the trouble. If you want pooled strings, use a pooled string class… it's proven that `std::string` should be as plain as possible, COW slows the common case. – Potatoswatter Jun 06 '12 at 16:03
11

Nothing.

Because std::string manages itself its contents, you can't have write access to the string's underlying data. That's undefined behavior.

However, creating and copying a char array is not hard:

std::string original("text");
std::vector<char> char_array(original.begin(), original.end());
char_array.push_back(0);

some_function(&char_array[0]);
slaphappy
  • 6,894
  • 3
  • 34
  • 59
  • 2
    so I cannot do &string[0] safely? – jmasterx Jun 06 '12 at 14:58
  • 6
    Actually, passing `&str[0]` should be well-defined behavior as long as the function does not try to exceed the length of the buffer. – Matteo Italia Jun 06 '12 at 14:59
  • Ill just copy it into a vector of char and call it a day. – jmasterx Jun 06 '12 at 15:00
  • 5
    @MatteoItalia: only in C++11, where it's guaranteed that `std::string`'s contents is contiguous. –  Jun 06 '12 at 15:01
  • @Fanael Even C++11 still says of `c_str()` and the new `data()`: "Requires: The program shall not alter any of the values stored in the character array." Not sure why. – Potatoswatter Jun 06 '12 at 15:02
  • @MatteoItalia: C++03 made `std::vector` contiguous, but not `std::string`. –  Jun 06 '12 at 15:02
  • 1
    @MatteoItalia No, C++03 allowed noncontiguous `string` implementations, but nobody ever found such practical so they changed in favor of user flexibility and implementation rigidity. – Potatoswatter Jun 06 '12 at 15:03
  • 2
    @kbok: ugh, performing a copy that way isn't a good idea, you don't have any exception safety; if you really want to be sure use `std::vector` instead, it's guaranteed to be contiguous and is exception-safe. – Matteo Italia Jun 06 '12 at 15:03
  • 1
    @Potatoswatter: I think that's in order to let strings implement copy-on-write. – slaphappy Jun 06 '12 at 15:03
  • @kbok ah, of course. Interesting that COW hasn't been deprecated yet. – Potatoswatter Jun 06 '12 at 15:04
  • 1
    You can use the appropriate `vector` constructor to simplify this code: `std::vector buf( s.begin(), s.end() ); buf.push_back( '\0' );` – Frerich Raabe Jun 06 '12 at 15:08
  • I prefer the std copy way, this way is not good for reuse. I'm not just using this once. Copying will not call new and reallocate unless needed. – jmasterx Jun 06 '12 at 15:27
  • 1
    @Potatoswatter: SGI found non-contiguous strings useful enough to implement ropes... certainly making it a distinct class is better than choosing that for the `std::string` implementation. Ahh - I just saw this has also been covered in comments on Matteo's answer.... – Tony Delroy Jun 06 '12 at 15:46
  • @TonyDelroy Yes, I just meant specifically as `std::string`. Nobody should put megabytes of text in a contiguous array… it seems more text editors understood this in the late 80's-early 90's than today. – Potatoswatter Jun 06 '12 at 16:00
  • `some_function(char_array.data())` can be used just as well. – Marc.2377 Jul 13 '16 at 07:52
3

If you know that the function will not modify beyond str.size() you can obtain a pointer in one of different ways:

void f( char* p, size_t s ); // update s characters in p
int main() {
   std::string s=...;
   f( &s[0], s.size() );
   f( &s.front(), s.size() );
}

Note, this is guaranteed in C++11, but not in previous versions of the standard where it allowed for rope implementations (i.e. non-contiguous memory)

David Rodríguez - dribeas
  • 204,818
  • 23
  • 294
  • 489
  • 1
    Although the earlier standard allowed rope implementations, the C++ committee found that none had ever been implemented which is why they changed the requirements in C++11. I wouldn't worry about it. – Mark Ransom Jun 06 '12 at 15:07
  • 1
    @Mark: the way Herb Sutter reported it on his blog, they took a straw poll of those present, whether any active implementation used ropes. None did, and no significant C++ implementation was unrepresented at C++0x committee meetings. That's not quite the same as thoroughly investigating whether there has *ever* been a ropey string, but as you say it's enough not to need to worry about it in practice, because the whole point of their question was "will any implementation we care about need to change to support this additional requirement", and the answer was no. – Steve Jessop Jun 06 '12 at 15:13
2

If your implementation will not try to increase the length of the string then:

C++11:

std::string data = "This is my string.";
func(&*data.begin());

C++03:

 std::string data = "This is my string.";
 std::vector<char> arr(data.begin(), data.end());

 func(&arr[0]);
Chad
  • 18,706
  • 4
  • 46
  • 63
0

Here's a class that will generate a temporary buffer and automatically copy it to the string when it's destroyed.

class StringBuffer
{
public:
    StringBuffer(std::string & str) : m_str(str)
    {
        m_buffer.push_back(0);
    }
    ~StringBuffer()
    {
        m_str = &m_buffer[0];
    }
    char * Size(int maxlength)
    {
        m_buffer.resize(maxlength + 1, 0);
        return &m_buffer[0];
    }
private:
    std::string & m_str;
    std::vector<char> m_buffer;
};

And here's how you would use it:

// this is from a crusty old API that can't be changed
void GetString(char * str, int maxlength);

std::string mystring;
GetString(StringBuffer(mystring).Size(MAXLEN), MAXLEN);

If you think you've seen this code before, it's because I copied it from a question I wrote: Guaranteed lifetime of temporary in C++?

Community
  • 1
  • 1
Mark Ransom
  • 299,747
  • 42
  • 398
  • 622