12

So I have an std::string and have a function which takes char* and writes into it. Since std::string::c_str() and std::string::data() return const char*, I can't use them. So I was allocating a temporary buffer, calling a function with it and copying it into std::string.

Now I plan to work with big amount of information and copying this buffer will have a noticeable impact and I want to avoid it.

Some people suggested to use &str.front() or &str[0] but does it invoke the undefined behavior?

  • 8
    "*C++17 added added non-const `data()` to `std::string` but it still says that you can't modify the buffer.*" Huh? Where does it say that? – ildjarn Aug 29 '16 at 07:38
  • before accessing to `&std[0]` `&str.font()`, ... be sure to have memory, _ie_ that `str.size() > 0`. If you just have instantiate `str`, use `str.resize()` – Garf365 Aug 29 '16 at 07:47
  • 1
    @ildjam It looks like I totally misread that paper. `.data()` should just work. –  Aug 29 '16 at 07:49
  • See also: [How to convert a `std::string` to `const char*` or `char*`](https://stackoverflow.com/q/347949/4561887) – Gabriel Staples Jun 03 '22 at 06:01

4 Answers4

26

C++98/03

Impossible. String can be copy on write so it needs to handle all reads and writes.

C++11/14

In [string.require]:

The char-like objects in a basic_string object shall be stored contiguously. That is, for any basic_string object s, the identity &*(s.begin() + n) == &*s.begin() + n shall hold for all values of n such that 0 <= n < s.size().

So &str.front() and &str[0] should work.

C++17

str.data(), &str.front() and &str[0] work.

Here it says:

charT* data() noexcept;

Returns: A pointer p such that p + i == &operator[](i) for each i in [0, size()].

Complexity: Constant time.

Requires: The program shall not alter the value stored at p + size().

The non-const .data() just works.

The recent draft has the following wording for .front():

const charT& front() const;

charT& front();

Requires: !empty().

Effects: Equivalent to operator[](0).

And the following for operator[]:

const_reference operator[](size_type pos) const;

reference operator[](size_type pos);

Requires: pos <= size().

Returns: *(begin() + pos) if pos < size(). Otherwise, returns a reference to an object of type charT with value charT(), where modifying the object leads to undefined behavior.

Throws: Nothing.

Complexity: Constant time.

So it uses iterator arithmetic. so we need to inspect the information about iterators. Here it says:

3 A basic_string is a contiguous container ([container.requirements.general]).

So we need to go here:

A contiguous container is a container that supports random access iterators ([random.access.iterators]) and whose member types iterator and const_iterator are contiguous iterators ([iterator.requirements.general]).

Then here:

Iterators that further satisfy the requirement that, for integral values n and dereferenceable iterator values a and (a + n), *(a + n) is equivalent to *(addressof(*a) + n), are called contiguous iterators.

Apparently, contiguous iterators are a C++17 feature which was added in these papers.

The requirement can be rewritten as:

assert(*(a + n) == *(&*a + n));

So, in the second part we dereference iterator, then take address of the value it points to, then do a pointer arithmetic on it, dereference it and it's the same as incrementing an iterator and then dereferencing it. This means that contiguous iterator points to the memory where each value stored right after the other, hence contiguous. Since functions that take char* expect contiguous memory, you can pass the result of &str.front() or &str[0] to these functions.

Community
  • 1
  • 1
  • Good answer. Just some adds: `std::string` is a contiguous memory container only since C++11. So you can use safetly `&str[0]` since C++11 only (before, it's std lib implementation specific). Before C++11 you have to use a temporary `std::vector` and copy it to a `std::string` or use it to instantiate a `std::string` – Garf365 Aug 29 '16 at 07:32
  • 2
    Given that the definition of `operator[]` says "where modifying the object leads to undefined behavior." I think your answer is wrong. I think this is a defect in the standard. – Martin Bonner supports Monica Aug 29 '16 at 07:32
  • @Garf365: In practise though, there was no standard library that *didn't* offer contiguous memory for `std::string` (since C++98), so it has always been safe, and C++11 says that it always will be safe. – Martin Bonner supports Monica Aug 29 '16 at 07:34
  • 1
    @MartinBonner it says that only for `pos==size`. In other words you are not suposed to touch terminating symbol. Everything before it is a fair game. – Revolver_Ocelot Aug 29 '16 at 07:35
  • I think I have misread the definition of `operator[]`. The undefined behaviour only applies in the case where `pos` is not ` – Martin Bonner supports Monica Aug 29 '16 at 07:36
  • @Revolver_Ocelot: snap! – Martin Bonner supports Monica Aug 29 '16 at 07:36
  • It contains an important research through the standard. –  Aug 29 '16 at 08:06
  • This answer seems to be misleading, no? based on various sources, including [this cppreference.com community wiki page for `std::string::operator[]`](https://en.cppreference.com/w/cpp/string/basic_string/operator_at), writing to any position at index >= `mystr.size()` is _undefined behavior_. This means that it is a bug to write to your `std::string` as though it was a `char*` _unless you preload it with a bunch of garbage data to increase its size to the buffer size needed, no?_ It _appears_ that even `str.reserve()` is NOT sufficient because it modifies the capacity, not the size! Correct? – Gabriel Staples May 24 '22 at 17:20
  • Update: see 1) [my comment here](https://stackoverflow.com/questions/7836863/is-there-a-way-to-get-stdstrings-buffer#comment127847227_15863513) and 2) the [notes at the bottom of my question here](https://stackoverflow.com/q/72367123/4561887) for clarification on how to preallocate a `std::string` using `my_string.resize()` in order to prepare it to be used as a `char*` buffer from index `0` to `my_string.size() - 1`, inclusive. – Gabriel Staples May 25 '22 at 05:17
  • I've [added an answer here](https://stackoverflow.com/a/72485404/4561887) to explain in detail what I said in my last 2 comments. – Gabriel Staples Jun 03 '22 at 06:28
3

You can simply use &s[0] for a non-empty string. This gives you a pointer to the start of the buffer

When you use it to put a string of n characters there the string's length (not just the capacity) needs to be at least n beforehand, because there's no way to adjust it up without clobbering the data.

I.e., usage can go like this:

auto foo( int const n )
    -> string
{
    if( n <= 0 ) { return ""; }

    string result( n, '#' );   // # is an arbitrary fill character.
    int const n_stored = some_api_function( &result[0], n );
    assert( n_stored <= n );
    result.resize( n_stored );
    return result;
}

This approach has worked formally since C++11. Before that, in C++98 and C++03, the buffer was not formally guaranteed to be contiguous. However, for the in-practice the approach has worked since C++98, the first standard – the reason that the contiguous buffer requirement could be adopted in C++11 (it was added in the Lillehammer meeting, I think that was 2005) was that there were no extant standard library implementations with a non-contiguous string buffer.


Regarding

C++17 added added non-const data() to std::string but it still says that you can't modify the buffer.

I'm not aware of any such wording, and since that would defeat the purpose of non-const data() I doubt that this statement is correct.


Regarding

Now I plan to work with big amount of information and copying this buffer will have a noticeable impact and I want to avoid it.

If copying the buffer has a noticeable impact, then you'd want to avoid inadvertently copying the std::string.

One way is to wrap it in a class that's not copyable.

Cheers and hth. - Alf
  • 142,714
  • 15
  • 209
  • 331
0

I don't know what you intend to do with that string, but if
all you need is a buffer of chars which frees its own memory automatically,
then I usually use vector<char> or vector<int> or whatever type
of buffer you need.

With v being the vector, it's guaranteed that &v[0] points to
a sequential memory which you can use as a buffer.

Israel Unterman
  • 13,158
  • 4
  • 28
  • 35
  • it's also true for `std::string` since C++11 and before, as said MatinBonner in a comment of other answer, although that behavior is non standard, most of standard lib implements `std::string` as a contiguous memory – Garf365 Aug 29 '16 at 07:45
0

Note: if you consider string::front() to be the same as &string[0] then the following is a redundant answer:

According to cplusplus: In C++98, you shouldn't write to .data() or .c_str(), they are to be treated as read-only/const:

A program shall not alter any of the characters in this sequence.

But in C++11 this warning was removed, but the return values are still const, so officially it isn't allowed in C++11 either. So to avoid undefined behavior, you can use string::front(), which:

If the string object is const-qualified, the function returns a const char&. Otherwise, it returns a char&.

So if your string isn't const, then you are officially allowed to manipulate the contents returned by string::front(), which is a reference to the first element of the buffer. But the link doesn't mention which C++ standard this applies to. I assume C++11 and later.

Also, it returns the first element, not a pointer, so you'll need to take its address. It's not clear whether you are officially allowed to use that as a const char* for the whole buffer, but in combination with other answers, I'm sure it's safe. Atleast it doesn't produce any compiler warnings.

mo FEAR
  • 552
  • 4
  • 8