110

Using http://en.cppreference.com/w/cpp/string/basic_string_view as a reference, I see no way to do this more elegantly:

std::string s = "hello world!";
std::string_view v = s;
v = v.substr(6, 5); // "world"

Worse, the naive approach is a pitfall and leaves v a dangling reference to a temporary:

std::string s = "hello world!";
std::string_view v(s.substr(6, 5)); // OOPS!

I seem to remember something like there might be an addition to the standard library to return a substring as a view:

auto v(s.substr_view(6, 5));

I can think of the following workarounds:

std::string_view(s).substr(6, 5);
std::string_view(s.data()+6, 5);
// or even "worse":
std::string_view(s).remove_prefix(6).remove_suffix(1);

Frankly, I don't think any of these are very nice. Right now the best thing I can think of is using aliases to simply make things less verbose.

using sv = std::string_view;
sv(s).substr(6, 5);
sehe
  • 374,641
  • 47
  • 450
  • 633
  • 6
    "I don't think any of these are very nice" What is wrong with the first one? Seems completely clear to me. Edit: BTW combining two methods that have clear meaning individually (`string_view(s).substr(...)`) seems nicer than a single function doing two things at once (`.substr_view(...)`) even if it did exist. – Arthur Tacca Sep 04 '17 at 09:20
  • 2
    @sehe: are you suggesting that `std::string::substr` should return a `std::string_view`? – geza Sep 04 '17 at 09:33
  • 1
    @geza I strongly seem to remember additions to `std::basic_string<>` interface that would add an operation like `subtr_view`, indeed. I mentioned that in the question. I was hoping someone would respond and say "That's proposal Nxxxx which got rejected/accepted into C++zz or TSn" – sehe Sep 04 '17 at 09:40
  • 1
    @ArthurTacca since the operation is so common, I think there should be a 1 step operation, possibly also more efficient. And certainly less error-prone: http://coliru.stacked-crooked.com/a/fd180519dc9b2f00 A free function is now the best we can do (in absense of https://en.wikipedia.org/wiki/Uniform_Function_Call_Syntax) – sehe Sep 04 '17 at 09:44
  • @sehe It comes down to a matter of opinion. In my opinion, the function you're referring to is essentially "make a string view from a string and make a substring from it", and I consider it cleaner to expose these two operations separately and let them be composed, so long as it is not significantly longer than calling a free function (which it isn't). – Arthur Tacca Sep 04 '17 at 09:53
  • @sehe The complaint you linked to wouldn't be fixed by adding a free function anyway. Instead you would need to *remove* the implicit constructor of `std::string_view` from `std::string` (but this would make it much more annoying to use in function calls). The real problem there is that returning a `string_view` is always dangerous and probably never advisable; it is exactly like returning a raw pointer, where you need to be sure you understand the lifetime of the pointed-to data. – Arthur Tacca Sep 04 '17 at 09:55
  • @ArthurTacca Of course. See [above](https://stackoverflow.com/questions/46032307/how-to-efficiently-get-a-string-view-for-a-substring-of-stdstring?noredirect=1#comment79028403_46032307) again. It's a preference. And I value my style. And I know why I value it (reduce errors, because the combined operation would be more expressive _and_ tdo the right thing). Re. "The real problem": [see here](https://stackoverflow.com/questions/46032307/how-to-efficiently-get-a-string-view-for-a-substring-of-stdstring?noredirect=1#comment79025792_46032475) – sehe Sep 04 '17 at 09:55
  • 5
    @sehe so the upshot of my alarmist ranting in the cpporg group is that no-one else seems to think that string_view is a problem. The answer seems to be, "just never return a string_view", adding string_view to the arbitrary list of classes that one should "just know" not to return. In which case the `std::string_view::substr()` method breaks its own rules, since it returns a string_view. So I guess the advice would be never to do this. Use a `std::string`. – Richard Hodges Sep 05 '17 at 07:56
  • @RichardHodges I think a better rule is "never return a `string_view` from a function that doesn't take a `string_view` as a parameter." Such functions are useful and safe IMO. – Jonathan Sharman Feb 17 '19 at 00:29
  • @JonathanSharman That rule needs to be a whole lot more accurate for it to work. But yeah, we need rules like that, and that's sad – sehe Feb 17 '19 at 15:25

4 Answers4

60

There's the free-function route, but unless you also provide overloads for std::string it's a snake-pit.

#include <string>
#include <string_view>

std::string_view sub_string(
  std::string_view s, 
  std::size_t p, 
  std::size_t n = std::string_view::npos)
{
  return s.substr(p, n);
}

int main()
{
  using namespace std::literals;

  auto source = "foobar"s;

  // this is fine and elegant...
  auto bar = sub_string(source, 3);

  // but uh-oh...
  bar = sub_string("foobar"s, 3);
}

IMHO the whole design of string_view is a horror show which will take us back to a world of segfaults and angry customers.

update:

Even adding overloads for std::string is a horror show. See if you can spot the subtle segfault timebomb...

#include <string>
#include <string_view>

std::string_view sub_string(std::string_view s, 
  std::size_t p, 
  std::size_t n = std::string_view::npos)
{
  return s.substr(p, n);
}

std::string sub_string(std::string&& s, 
  std::size_t p, 
  std::size_t n = std::string::npos)
{
  return s.substr(p, n);
}

std::string sub_string(std::string const& s, 
  std::size_t p, 
  std::size_t n = std::string::npos)
{
  return s.substr(p, n);
}

int main()
{
  using namespace std::literals;

  auto source = "foobar"s;
  auto bar = sub_string(std::string_view(source), 3);

  // but uh-oh...
  bar = sub_string("foobar"s, 3);
}

The compiler found nothing to warn about here. I am certain that a code review would not either.

I've said it before and I'll say it again, in case anyone on the c++ committee is watching, allowing implicit conversions from std::string to std::string_view is a terrible error which will only serve to bring c++ into disrepute.

Update

Having raised this (to me) rather alarming property of string_view on the cpporg message board, my concerns have been met with indifference.

The consensus of advice from this group is that std::string_view must never be returned from a function, which means that my first offering above is bad form.

There is of course no compiler help to catch times when this happens by accident (for example through template expansion).

As a result, std::string_view should be used with the utmost care, because from a memory management point of view it is equivalent to a copyable pointer pointing into the state of another object, which may no longer exist. However, it looks and behaves in all other respects like a value type.

Thus code like this:

auto s = get_something().get_suffix();

Is safe when get_suffix() returns a std::string (either by value or reference)

but is UB if get_suffix() is ever refactored to return a std::string_view.

Which in my humble view means that any user code that stores returned strings using auto will break if the libraries they are calling are ever refactored to return std::string_view in place of std::string const&.

So from now on, at least for me, "almost always auto" will have to become, "almost always auto, except when it's strings".

Community
  • 1
  • 1
Richard Hodges
  • 68,278
  • 7
  • 90
  • 142
  • 1
    Smart thinking with the implicit conversion of the argument to `string_view`. How about: http://coliru.stacked-crooked.com/a/8e353f2a8f584998 – sehe Sep 04 '17 at 07:53
  • 3
    @sehe updated with an even worse scenario - the result of good intentions going wrong. Expect to see this in a code base near you soon... – Richard Hodges Sep 04 '17 at 07:54
  • Agreed the second example is subtle, but luckily it has _nothing_ to do with the free function `sub_string`: http://coliru.stacked-crooked.com/a/7adf7a3cc661b173 – sehe Sep 04 '17 at 08:00
  • 10
    @sehe I find it astonishing that the committee did not see this coming. It seems for once they have valued the quick-fix performance improvement of existing algorithms over code-safety. I fear this is a grave error. – Richard Hodges Sep 04 '17 at 08:11
  • 4
    Who says they didn't see it coming? It must have been weighing options. I mean, `string foo(); bool bar(string_view); auto check = bar(foo());` is safe and reasonable to want to allow. – sehe Sep 04 '17 at 08:14
  • 3
    @sehe: What Richard Hodges shows is precisely the reason why implicit conversion to ``const char *`` was disallowed. The committee's reasoning is not consistent, and I fully agree with RH, is likely to cause many developers (especially newcomers) untold pain. – Arne Vogel Sep 04 '17 at 08:17
  • 6
    @sehe of course I understand the rationale. It's attractive. It could be achieved safely if string_view were non-copyable and non-moveable. Then you couldn't return one from a function and all would be well. – Richard Hodges Sep 04 '17 at 08:17
  • To recap: yeah the free function is probably the most elegant, if you account for the pitfalls. (The design issues with http://en.cppreference.com/w/cpp/string/basic_string/operator_basic_string_view not being suitably ref-qualified and/or `explicit` is a bit off topic, but thanks for the insights none-the-less!) – sehe Sep 04 '17 at 08:23
  • @RichardHodges: can you please explain where the time bomb is here? – geza Sep 04 '17 at 08:40
  • 7
    @geza your question validates my concern very nicely. The second assignment to `bar` is a `string_view` constructed from a `string_view` constructed from a temporary `std::string`, which has now been destructed. – Richard Hodges Sep 04 '17 at 08:45
  • @RichardHodges No. It's a string_view constructed from a from a temporary string, directly. Which is why I posted my de-mystifying version: http://coliru.stacked-crooked.com/a/7adf7a3cc661b173 (`sub_string("foobar"s, 3)` returns another temporary `std::string`, correctly!). See how it's not related to your `::sub_string` at all :) – sehe Sep 04 '17 at 08:56
  • 1
    @RichardHodges: thanks for the answer! But I still fail to see it. Where is std::string_view involved here? Maybe I fail to see something very obvious here. Where do I think wrong? `source` and `bar` is `std::string`. At the second `bar` assignment, a temp `std::string` constructed, and the second `sub_string` is called, which returns a `std::string`. No `std::string_view` here... – geza Sep 04 '17 at 08:58
  • 1
    @geza hah - you're right. I've fixed the bug inadvertently. – Richard Hodges Sep 04 '17 at 09:03
  • `sub_string("foobar"s, 3)` how can you expect anything other than trouble by constructing a view into a temporary object? – einpoklum Sep 04 '17 at 12:31
  • 1
    @einpoklum with std::string this is perfectly reasonable, it returns a new string. – Richard Hodges Sep 04 '17 at 12:56
  • 1
    I think we could remove a whole bunch of these comments, if you put in a section at the end (using "spoiler" markup) which explained the problem. – Martin Bonner supports Monica Sep 04 '17 at 13:20
  • 1
    @RichardHodges: Well, a string view is not a string. So, are you saying it's the name that throws you off? Would you be ok if the function were called "sub_string_view"? – einpoklum Sep 04 '17 at 14:33
  • 2
    @einpoklum The problem isn't that hard to understand, but first it would have to _be understood_ by basically all programmers __and__ it would have to be constantly in their minds. That, I don't think is very safe. It would be akin to teaching pointers all over again. – Passer By Sep 04 '17 at 14:38
  • 3
    @MartinBonner I'd be happy if all the non-related design rant bits go. My first comment here shows a single line [`=delete` declaration that handles](http://coliru.stacked-crooked.com/a/8e353f2a8f584998) that tangential issue. – sehe Sep 04 '17 at 22:12
  • 1
    @RichardHodges, .... Great catch!.. Time for one additional entry in the CppCoreGuidlines :-). And time to once again, make the compiler treat some `std::` types specially. haha... Maybe we should add a new GCC `W` flag, ditto Clang: `-Wtemporary-basic_string-to-basic_string_view` ...:-) – WhiZTiM Sep 05 '17 at 07:37
  • 2
    @WhiZTiM how about this in the core guidelines, "thou shalt not create objects that pretend to be values in a pointless quest for nice-looking code. Pointers are pointers, values are values." – Richard Hodges Sep 05 '17 at 07:51
  • 1
    @RichardHodges come on. The name is `string_view`. ***View***. – sehe Sep 05 '17 at 15:36
  • 3
    @sehe before c++11 I would have agreed with you. But now its name is `auto`. **auto**. See? no clues at all by looking at user code as to whether it's a safe value-object or a dangerous reference object. – Richard Hodges Sep 05 '17 at 15:54
  • 1
    Well. Down that road lies madness. `auto` is a tool, not a requirement. Besides it's really off topic. The same argument goes for any 'proper' non-value type. Meanwhile the name of `string_view` is really not `auto` by any stretch of imagination. – sehe Sep 05 '17 at 15:58
  • What does character *s* do at the end of this expression auto source = "foobar"s ? – bigdata2 Apr 08 '20 at 19:06
  • @bigdata2 it constructs a std::string - https://en.cppreference.com/w/cpp/string/basic_string/operator%22%22s – Richard Hodges Apr 09 '20 at 05:58
  • `// this is fine and elegant... auto bar = sub_string(source, 3);` This is NOT fine. You're returning a pointer to a temporary that you created in sub_string(): `return s.substr(p, n);`. That is going to explode 100%. – Josh Sanders Jan 06 '22 at 02:00
54

You can use the conversion operator from std::string to std::string_view:

std::string s = "hello world!";
std::string_view v = std::string_view(s).substr(6, 5);
CAF
  • 1,090
  • 9
  • 6
  • This approach is not obviously efficient (i.e. does it do the job without allocating an extra temporary buffers?), so a comment supporting the claim to efficiency would be good. In fact, it looks like this approach is very contra-indicated because `v` points at a temporary buffer that immediately becomes invalid. – John Doggett Jul 26 '20 at 23:03
  • 1
    @JohnDoggett: What are you talking about? Why should there be a temporary buffer? – MikeMB Jan 03 '21 at 09:53
  • 4
    Imho this is a far better solution than the accepted one. – MikeMB Jan 03 '21 at 09:54
  • `.substr(6, 5)` allocates a new `std::string` but it is a temporary and goes away at the `;`, leaving the `v` pointing at a buffer that has been cleared. – caps Jan 11 '21 at 20:45
  • 4
    @caps no, [`string_view::substr`](https://en.cppreference.com/w/cpp/string/basic_string_view/substr) returns a view based on the original view. Perhaps it would have been less confusing if they'd called it `subview`? – M.M Jan 27 '21 at 09:53
  • 1
    @M.M you are correct. I thought the code sample was calling `string::substr`. – caps Feb 03 '21 at 21:36
  • I think this is the best practice. I think if you would like to explain in your answer that your solution need no extra allocation and copy, your answer will get much more attention and benefit more people. – LI Qimai Sep 17 '21 at 09:23
5

This is how you can efficiently create a sub-string string_view.

#include <string>
inline std::string_view substr_view(const std::string& source, size_t offset = 0,
                std::string_view::size_type count = 
                std::numeric_limits<std::string_view::size_type>::max()) {
    if (offset < source.size()) 
        return std::string_view(source.data() + offset, 
                        std::min(source.size() - offset, count));
    return {};
}

#include <iostream>
int main(void) {
  std::cout << substr_view("abcd",3,11) << "\n";

  std::string s {"0123456789"};
  std::cout << substr_view(s,3,2) << "\n";

  // be cautious about lifetime, as illustrated at https://en.cppreference.com/w/cpp/string/basic_string_view
  std::string_view bad = substr_view("0123456789"s, 3, 2); // "bad" holds a dangling pointer
  std::cout << bad << "\n"; // possible access violation

  return 0;
}
Orwellophile
  • 13,235
  • 3
  • 69
  • 45
Alexander
  • 400
  • 7
  • 16
  • It might be a good idea to highlight a "bad use", as the given example could easily evolve into one. Proposed an edit to do just that. – John Doggett Jul 26 '20 at 22:58
  • Prevent the bad case from happening -> std::string_view substr_view(std::string const&&, etc.) = delete; – Mercury Dime Dec 05 '20 at 14:25
  • This one should have been marked the correct answer, its the only correct answer on this page. You cannot create a string_view from `substr` unless you want to keep that substr in memory for the lifetime of your string_view. The accepted answer below is completely incorrect. – Josh Sanders Jan 06 '22 at 02:02
0

I realize that the question is about C++17, but it's worth noting that C++20 introduced a string_view constructor that accepts two iterators to char (or whatever the base type is) which allows writing

std::string_view v{ s.begin() +6, s.begin()+6 +5 };

Not sure if there is a cleaner syntax, but it's not difficult to

#define RANGE(_container,_start,_length) (_container).begin() + (_start), (_container).begin() + (_start) + (_length)

for a final

std::string_view v{ RANGE(s,6,5) };

PS: I called RANGE's first parameter _container instead of _string for a reason: the macro can be used with any Container (or class supporting at least begin() and end()), even as part of a function call like

auto pisPosition= std::find( RANGE(myDoubleVector,11,23), std::numbers::pi );

PPS: When possible, prefer C++20's actual ranges library to this poor person's solution.

Mario Rossi
  • 7,651
  • 27
  • 37