7

Is it "legal" to increment the end iterator of a std::string in order to include the null-terminator in the range?

For example

std::string my_text{"Arbitrary string"};
std::vector<std::uint8_t> my_collection{};
my_collection.insert(my_collection.end(), std::begin(my_text), std::next(std::end(my_text)));

The reason I ask is that I'd just prefer to avoid the pointer arithmetic involved in my_text.c_str() + my_text.size() (or is it my_text.size()+1?).

I'm reasonably confident that most implementations today would behave as expected. Still, answers which include C++-legalese for the language lawyers among us are appreciated. Just so I have an airtight defense if I'm ever in C++ court.

josaphatv
  • 633
  • 4
  • 19
  • Couldn't you just append a NUL byte yourself? Like in this answer: https://stackoverflow.com/a/505047 – Julia Mar 11 '21 at 21:04
  • There is equivalent, if not more, arithmetic involved in obtaining a non-begin iterator then incrementing it. The question is interesting, but the motive seems to be based on a false premise. – François Andrieux Mar 11 '21 at 21:04
  • Why do you need to keep '\0' in `my_collection`? – Eugene Mar 11 '21 at 21:11
  • 3
    This looks like an interesting edge case in the language. Generally, you should never access `*end()`, but for a `std::string`, `*end()` is actually the null terminator that has been guaranteed to be at the end of the string since C++11. So this should work, but I'm not sure if it is actually defined to be legal. – NathanOliver Mar 11 '21 at 21:11
  • @FrançoisAndrieux I would say that `next(begin(s), s.size())`, while "equivalent" to `end(s)`, is not the same. – josaphatv Mar 11 '21 at 21:11
  • I don't think this is legal. – Hatted Rooster Mar 11 '21 at 21:12
  • @Eugene Because I need that. `my_collection` is effectively my scratch buffer for an EEPROM that I'm going to overwrite in one go. Other components I don't have control over expect null-terminated strings in said EEPROM. – josaphatv Mar 11 '21 at 21:13
  • @josaphatv `end(s)` and `next(begin(s), s.size())` are going to be equivalent for most reasonable modern implementations of C++ : https://godbolt.org/z/37qxje – François Andrieux Mar 11 '21 at 21:14
  • @FrançoisAndrieux I know they're functionally the same, but in my opinion, `end+1` is easier to read and understand and I'd prefer it if it were allowed by the language. – josaphatv Mar 11 '21 at 21:16
  • @josaphatv Are you looking for answers that quote the standard for this? Also, if so, which version of C++ are you asking about? – NathanOliver Mar 11 '21 at 21:16
  • 2
    @josaphatv If the motive is readability, then that is understandable. But in my opinion `end(s) + 1` or `next(end(s))` should both *immediately* raise a red flag. Even if it is defined by the language, it is likely not worth the confusion this expression could cause to future readers of the code. The range `s.c_str()` to `s.c_str() + s.size() + 1` seems less suspicious and the use of `c_str` clearly indicates that the range is referring to a null terminated string. – François Andrieux Mar 11 '21 at 21:18
  • The standard doesn't require the iterators to actually be `char*` or `const char*`. If you use `&my_text[0], &my_text[0] + my_text.size() + 1` it should be safe since C++11. Ugly though ... – Ted Lyngmo Mar 11 '21 at 21:25
  • It is also not clear if `s.data() + s.size() + 1` is a valid pointer. – Slava Mar 11 '21 at 21:25
  • @Slava That pointer is always at least a valid one-past-the-end pointer. Edit : at least since C++11. – François Andrieux Mar 11 '21 at 21:26
  • @FrançoisAndrieux I cannot find where it would explicitly say so, it only says that `s.data() + s.size()` is valid and dereferencable, but nothing about next one. – Slava Mar 11 '21 at 21:28
  • From https://eel.is/c++draft/strings#basic.string.general-3 `data() + size() points at an object with value charT() (a “null terminator”)`. From https://eel.is/c++draft/basic.compound#3.4 `A value of a pointer type that is a pointer to or past the end of an object....` -> `s.data() + s.size() + 1` has to be valid. But I don't see - `*end()` and `end() + 1` might be invalid - iterator not necessarily is a `char*`. – KamilCuk Mar 11 '21 at 21:28
  • 1
    @KamilCuk if that is the case then `s.data(),s.data() + s.size() + 1` can be used as pair of iterators instead of `s.begin(),s.end() + 1` – Slava Mar 11 '21 at 21:30
  • @Slava If a pointer is dereferencable, it points to either a stand-alone object or an element in an array. An object can be treated as a 1 element array for the purpose of pointer arithmetic. In both cases, it can be treated as a pointer to an element in an array and a pointer to an object (so not a one-past-the-end pointer). So it is always safe to increment a dereferencable pointer, though it may yield a one-past-the-end pointer and might not be dereferencable. – François Andrieux Mar 11 '21 at 21:31
  • @FrançoisAndrieux cool, then it can be safely used. Note you suggestion for `c_str()` would not work, as OP wants to modify that data. – Slava Mar 11 '21 at 21:32
  • 1
    @Slava No, they are copying the string into a vector of bytes so `c_str()` will work fine. Since C++17 you could use `data()` if you needed a mutable range instead of `c_str()`. `c_str()` and `data()` do the same thing since C++11 and `data()` has a non-const overload since C++17. – François Andrieux Mar 11 '21 at 21:34
  • 1
    `data() + size() + 1` is not a valid pointer if the string is empty, since `data()` *may* return `nullptr` (depending on implementation), and `nullptr+N` is not legal. `c_str() + size() + 1` will always be a valid pointer, even if the string is empty, as `c_str()` never returns `nullptr` and `+size+1` will always point to the address after the null terminator. – Remy Lebeau Mar 11 '21 at 21:39
  • @RemyLebeau `data()` returns a pointer to an empty null terminated string when the string is empty, a pointer to a single `'\0'` character. `data()` on an `std::string` never returns `nullptr`. Edit : Even pre-C++11 it had to return a non-null pointer, though the pointer may not be safe to dereference. Though I prefaced this thread of comments with "since C++11". – François Andrieux Mar 11 '21 at 21:40
  • @FrançoisAndrieux that is guaranteed since C++11, but not before. – Remy Lebeau Mar 11 '21 at 21:43
  • 1
    @RemyLebeau You missed the edit. The `data() + size() + 1` may not be derivable pre C++11, but `data()` was never allowed to be a null pointer value, even if the string was empty. The reason it didn't work was because the pointed array may not have a null terminator yet. But again, the whole comment thread was prefaced with "since C++11". – François Andrieux Mar 11 '21 at 21:44
  • 3
    `auto view = std::string_view(my_text.c_str(), my_text.size() + 1); my_collection.insert(my_collection.end(), std::begin(view), std::end(view));` The `c_str()` method returns a pointer to a buffer that is guranteed to contain the terminating null `'\0'` – Martin York Mar 11 '21 at 21:48
  • Visual studio under debug mode triggers an assertion if you try to increment the end iterator of a string. So either it is UB or Visual Studio is wrong (which IMO, happens sometimes). – ph3rin Mar 11 '21 at 21:55

1 Answers1

0

I would expect that incrementing the end iterator implies undefined behaviour, but I dont find a reference for.

But I have a reference that the dereferenciation of end is undefined behaviour: https://en.cppreference.com/w/cpp/string/basic_string/end

So the standard doesn't ensure that anything expected happen if you do so.

gerum
  • 974
  • 1
  • 8
  • 21
  • 2
    FWIW, cppreference is not *the standard*. It can be and has been wrong before. – NathanOliver Mar 11 '21 at 21:23
  • Im principle you are right, but do you think that the mistakes are that often that we should not use it as a source? And additionally; Do you bought the official standard? – gerum Mar 11 '21 at 21:36
  • 1
    My comment was in regards to *So the standard doesn't ensure that anything expected happen if you do so.*. Nowhere in your answer do you actually cite the standard. If you want a linkable standard, you can use this: https://timsong-cpp.github.io/cppwp/ – NathanOliver Mar 11 '21 at 21:45
  • But what is the alternative you suggest? We could ignore all nonofficial standard websites, because they are not the official standard, but that would mean most of use would not have an access to the standard. Or we assume that the pages are correct according to the standard ( at least for the usual cases, which are not using the finest details), and that is what I am doing and I think most people would agree. – gerum Mar 11 '21 at 21:48
  • 4
    I'm not saying don't use it, you just can't call cpprefererence the standard. It isn't. It's a reference site. If you say *So the standard ...*, you should be quoting the standard to back that up. – NathanOliver Mar 11 '21 at 21:50
  • 1
    This answer appears to be correct, but I'm downvoting it for the misleading mention of the reference being from the Standard. As already mentioned in the comments above, either change your link to point to the correct passage from the standard (a working draft is fine), or much simpler, replace *standard* with *cppreference* or just *reference*. – cigien Mar 12 '21 at 00:27
  • Incrementing the past the end iterator has nothing to do with dereferencing it. – rustyx Mar 12 '21 at 00:38
  • @rustyx not direct, but the .insert function will dereference all iterators except the end, so there will be undefined behaviour at that point, and that is what I say in mt answer. – gerum Mar 12 '21 at 07:51
  • @NathanOliver Ok, I dont understand your point. You mean it is better to refers to a page which explicitly tell us that "It's known to be incomplet and incorrekt, and it has lots of bad formatting."? – gerum Mar 12 '21 at 07:55
  • 1
    No one has said *better*. We're only saying that you should be consistent. If you quote cppreference, then say "reference", and if you're quoting the Standard (or working draft), say "standard". – cigien Mar 12 '21 at 12:32