What drawbacks would exist if std::string::substr returned std::string_view?

Question

Look at this example (taken from here):

class foo {
    std::string my_str_;

public:
    std::string_view get_str() const {
        return my_str_.substr(1u);
    }
};

This code is bad, because substr returns a temporary std::string, so the returned std::string_view refers to an already-destroyed object. But, if substr returned std::string_view, this problem would not exist.

Besides, it seems logical to me if substr returned std::string_view instead of std::string, because the returned string is a view of the string, and it is more performant, because no copy is made.

Would there be any drawbacks if substr returned std::string_view (besides the obvious drawback: losing some compatibility with C++14 - I'm not underrating the importance of this, I'd just like know whether other drawbacks exists)?

Related question: How to efficiently get a `string_view` for a substring of `std::string`

`string_view` is a relatively new thing, and standards have to retain backward compatibility. — The Quantum Physicist, Sep 04 '17 at 11:01
would `std::string_view sv = my_str_; return sv.substr(1u);` help? — Dev Null, Sep 04 '17 at 11:01
@DevNull: Yes, it fixes the problem. But I'd like to know the drawbacks if `substr` returned `std::string_view` (besides the obvious mentioned drawback). "Converting" to `std::string_view` could happen automatically in certain cases (like this case: `std::string::substr` could return it). — geza, Sep 04 '17 at 11:10
You would get the opposite problem for people expecting a temporary string, perhaps passing it to a C function `f(my_str.substr(1,5).c_str());`. — Bo Persson, Sep 04 '17 at 11:17
Why insisting on changing existing method signature and not introduce a new one e.g. `substringview` to make everybody happy? — W.F., Sep 04 '17 at 11:29
What happens when someone needs a modifiable substring? This is a good idea, but only if it's implemented separately from `substr()`. — Justin Time - Reinstate Monica, Jul 05 '19 at 18:54

The Quantum Physicist · Answer 1 · 2017-09-04T11:14:42.957

4

When string_view was invented, there was too much debate on whether it should be there. All the opposing arguments were flowing from examples like the one you showed.

However, like I always tell everyone with such bad examples: C++ is not Java, and is not Python. C++ is a low-level language, where you have almost full control over memory, and I repeat the cliché from Spiderman: With great power comes great responsibility. If you don't know what string_view is, then don't use it!

The other part of your question has a simple answer, and you answered it yourself:

Would there be any drawbacks if substr returned std::string_view (besides the obvious drawback: losing some compatibility with C++14)?

The harm is that every program that used a copy of the string from substr may not be valid anymore. Backward compatibility is a serious thing in the computer business, which is why Intel's 64-bit processors still accept x86 instructions, which is also why they're not out of business. It costs a lot of money to reinvent the wheel, and money is a major part in programming. So, unless you're planning to throw all C++ in the garbage and start over (like RUST did), you should maintain the old rules in every new version.

You can deprecate stuff, but very carefully and very slowly. But deprecation isn't like changing the API, which is what you're suggesting.

edited Sep 04 '17 at 11:14

answered Sep 04 '17 at 11:10

The Quantum Physicist

24,987
19
103
189

Thanks for the answer! I've edited my question a little to make myself more clear about the drawbacks part. – geza Sep 04 '17 at 11:14
The original boost string_view forbade conversions from string&& to string_view because the author correctly foresaw the dangers of allowing it. Since c++11, c++ has made great strides to improve code correctness by default. This one decision by the committee will undo all of that and introduce subtle segfaults into programs once again. With great power should also come a safety catch, so you can't wield the power without meaning to. – Richard Hodges Sep 04 '17 at 11:22
@RichardHodges That's the opposing argument from the people who didn't want `string_view`. While I totally support safety, I don't see any other solution other than `string_view`. It's either obnoxious char arrays, or copying substrings, or `string_view`. – The Quantum Physicist Sep 04 '17 at 11:25
@RichardHodges: yes, but then you could not call a `std::string_view` parameterized function with a temporary, like `std::string get_name(); fn(get_name());`, which is not good either. – geza Sep 04 '17 at 11:30
@geza Well, it's not a silver bullet :) – The Quantum Physicist Sep 04 '17 at 11:31
@RichardHodges Revisionist history much? The original - then-called `string_ref` - proposal (N3442) was based on Google's and LLVM's implementations, with Bloomberg's added in the next revision (N3512). [The modification to `boost::string_view` to disallow rvalue strings](https://github.com/boostorg/utility/commit/9960d9f395b79ee860e39064cd46961f76d2cb55) was made this February, lasted a mere 1.5 months before [it was reverted](https://github.com/boostorg/utility/commit/6c4ab93573904f6d37f74c8819ac39cd230118e9), and wasn't included in any Boost release. – T.C. Sep 04 '17 at 11:31
@T.C. ah yes, I see that the r-value protection code has been commented out in the boost source code. Tellingly, the comment `Constructing a string_view from a temporary string is a bad idea` has been left in. Whether or not my reading of history is wrong, string_view will irrevocably harm the reputation of c++ because it will be used by intermediate-level programmers who will litter their code with time bombs. Construction of string_views should be explicitly requested. It's a minor inconvenience in return for non-crashing programs. – Richard Hodges Sep 04 '17 at 11:39
@T.C.: is the discussion about this available somewhere? I'm really interested in [designing a new string class](https://stackoverflow.com/questions/44863134/reference-counted-string-copy-on-write-new-string-design), so I'd like to read every quality discussion about string. Thanks! – geza Sep 04 '17 at 11:40

score 3 · Answer 2 · answered Sep 04 '17 at 10:57

3

The drawback is crystal clear: it would be a significant API breaking change vs every version of C++ going back to the beginning.

C++ is not a language that tends to break API compatibility.

answered Sep 04 '17 at 10:57

John Zwinck

239,568
38
324
436

1

re-engineering the dangerous farce that is the string_view/string relationship might not be such a bad API breakage. It should not be possible to create a copyable data-reference object such as string_view from a temporary string without an explicit cast. – Richard Hodges Sep 04 '17 at 11:19
2

@RichardHodges: Breaking the API of `string_view` wouldn't be nearly as disruptive as breaking the API of `string::substr()`. – John Zwinck Sep 04 '17 at 11:20
We agree on that. The committee needs to address the current serious defect around std::string_view's permissive constructor set. – Richard Hodges Sep 04 '17 at 11:23
This is the correct answer. But just have to quibble on wording, you mean the C++ standard library doesn't tend to break compatibility - not the language itself? Although that's separately true too. – Barry Sep 04 '17 at 15:53

Arthur Tacca · Accepted Answer · 2020-10-29T09:40:12.137

3

Here is a concrete (if slightly incomplete) example of code that is currently safe, but would become undefined behaviour with the change:

std::string some_fn();
auto my_substr = some_fn().substr(3, 4);
// ... make use of my_substr ...

Arguably the use of auto is a little dubious here, but it is completely reasonable (in my opinion) in the following situation, where repeating the type name would be almost redundant:

const char* some_fn();
auto my_substr = std::string(some_fn()).substr(3, 4);
// ... make use of my_substr ...

Edit: Even if substr() had always returned a std::string_view, you can imagine this code causing some pain, even if only during development/debugging.

edited Oct 29 '20 at 09:40

answered Sep 04 '17 at 12:00

Arthur Tacca

8,833
2
31
49

Thanks, that's a real drawback! – geza Sep 04 '17 at 12:47
I'm not sure, why does the second version cause undefined behavior? Because of possibility that nullptr could be returned or what? – bielu000 Oct 28 '20 at 20:58
@bielu000 Technically the snippet as shown is OK, it's only if you use `my_substr` later that there's undefined behaviour, but I thought it was implicit that it would be given that there's a variable defined for it. I've edited my answer to make that clear. – Arthur Tacca Oct 29 '20 at 09:39
@bielu000 But in case you're still wondering why it's undefined behaviour: I'm talking about a hypothetical world where `.substr` returns a `string_view` instead of `string`, as the question asked about. So the expression `std::string(some_fn())` creates a `std::string` object and `.substr(3, 4)` returns a `std::string_view` that points into that string object. But once the statement completes, that `string` object is destroyed and its memory is released, while the `string_view` still points to it. That means any later use of `my_substr` will access memory that has already been freed. – Arthur Tacca Oct 29 '20 at 09:41

Lanting · Answer 4 · 2017-09-04T11:50:52.490

1

For one, the underlying data structure of a c++ string is kept mostly compatible with a c string (accessible through the c_str() member). C strings are null terminated. So you basically just have a starting char pointer, and keep increment that until the pointer points to 0.

A substring could thus start at an arbitrary position of your original string. However, as you can't just insert a null somewhere in the original string, your substring would still need to end at the same position as the original.

--edit-- as John Zwinck pointed out, c++ strings can contain \0 chars, however this would still mean that substrings would loose their c_str member, as it would require modifying the original string. A drawback of string_view which was also noticed in Using std::string_view with api, what expects null terminated string

edited Sep 04 '17 at 11:50

answered Sep 04 '17 at 10:57

Lanting

3,060
12
28

2

C++ `std::string` can easily contain null bytes, not just at the end. It is emphatically not a C string, because it has a separate length field. – John Zwinck Sep 04 '17 at 11:00
@JohnZwinck It's true that the type `std::string` can contain embedded null bytes in general, but it's possible that a particular `std::string` variable could be guaranteed not to contain null bytes due to the surrounding application logic. In that situation, `fn_taking_c_str(my_var.substr(0, 3).c_str())` is code that would currently work, and would stop working if `substr()` were changed to return a `string_view` instead. (Although at least it would be a compilation error, and could be easily remedied by changing to `fn_taking_c_str(std::string(my_var.substr(0, 3)).c_str())`.) – Arthur Tacca Jun 01 '20 at 12:18

What drawbacks would exist if std::string::substr returned std::string_view?

4 Answers4