1

This question is motivated by the following code fragment from how to pre-allocate memory for a std::string object?

file.seekg(0, std::ios::end);
s4.resize(file.tellg());
file.seekg(0, std::ios::beg);

file.read(&s4[0], s4.length());

It seems to me that this is only guaranteed to be both correct (subsequent s4.data() will return a null-terminated string of length N, total size N+1 bytes) and efficient (s4.data() won't need to do an extra reallocate/copy) if resize() allocated an extra byte for the null in the first place.

Does it indeed do so?

rwallace
  • 31,405
  • 40
  • 123
  • 242
  • 1
    `s4.data()` (or `&s4[0]`) **can't** reallocate/copy, so there is only the correctness question here. – Davis Herring Jun 22 '23 at 07:22
  • 2
    Please, be aware that (if you are on Windows) the size of file and the size of read data may be different if you opened the file _without_ `std::ios::binary`. The Windows is internally converting every line end (CR LF) to `\n` (LF) in that case. – Scheff's Cat Jun 22 '23 at 07:58

3 Answers3

4

The C++ Standard ([basic.string]) specifies that, for std::basic_string<charT>:

data() + size() points at an object with value charT() (a “null terminator”),

Furthermore, the data must be contiguous. Therefore, yes. There is always a terminating character allocated after the string data, to make it a valid C string.

Davislor
  • 14,674
  • 2
  • 34
  • 49
3

If you look at the source of basic_string in libstdc you can see that this is done in the _M_create function:

  // NB: Need an array of char_type[__capacity], plus a terminating
  // null char_type() element.
  return _S_allocate(_M_get_allocator(), __capacity + 1);

So for the libstdc, yes this is done.

But if this is required or not depends entirely on how the memory is handled by the underlying implementation. If e.g. the underlying OS would already guarantee that for a given pointer, an access after the allocated memory would always result in a \0, then this would not be required.

A side note:

An implementation could (and this is AFAIK also done for the small string optimization part) use the memory which contains the information about the length of the string also for the null termination part.

So you could have something like this:

struct string {
   size_t capacity;
   char* data;
}

And data could look like this:

[number of bytes equal to capacity][bytes required for storing length]

Length is stored as max_capacity-lenght_of_string so if the string reaches maximum capacity the bytes storing the length will be 0 and could then serve as the null termination so no additional byte for the \0 would need to be allocated (only the ones for the length).

t.niese
  • 39,256
  • 9
  • 74
  • 101
  • @RichardCritten It is required that it has constant complexity. But it does not say how this has to be solved. If the OS already guarantees that `ptr_to_memory[size_of_allocation]` is valid and always returns `\0`, then no additional bye would need to be allocated. But yes I'm not aware of any system that does that. – t.niese Jun 22 '23 at 08:02
2

Terminator does not count as an element in std::string

you could do the same with initializing the string with right size right away

size_t = file.tellg();
std::string str(size, '\0');
file.seekg(0);
file.read(&str[0], size);

https://en.cppreference.com/w/cpp/io/basic_istream/read

Hladu
  • 21
  • 3