2

In the following article under the section "Ownership Issues", it mentions that if someone sets up a std::string_view to look at a std::string variable then leaves the scope of that std::string, the std::string_view's behavior will be undefined.

https://www.learncpp.com/cpp-tutorial/6-6a-an-introduction-to-stdstring_view/

I understand that this is caused by the fact that the std::string variable is no longer in scope and "dies", so what's being viewed by the std::string_view is undefined. How would one work around this?

My current function does something to this effect:

std::string_view mergeIntoSV(std::string str1, std::string str2) {
  std::string new_str = str1 + str2;
  return std::string_view { new_str };
}

void callMerge() {
  std::string list[10];
  std::string_view views[5];
  //Assume some code that populates "list"...
  for (int i = 0; i < 10; i+=2) {
    views[i/2] = mergeIntoSV(list[i], list[i+1]);
  }
  //Now I want to be able to read each of the std::string_views.
}

However, after invoking this function elsewhere and then reading the contents of returned std::string_view, I (as the article suggests) read some undefined string value. Unfortunately I can't simply change the return type to std::string due to the fact that I'm working a larger codebase and required the return type to be as is.

How does one work around this out-of-scope problem?

  • 3
    Simple: return a `std::string` instead of a `std::string_view` – Indiana Kernick Mar 06 '20 at 00:50
  • For the purposes of this, I've abstracted out a lot of code and I have to return a std::string_view. I'm working inside a larger code base which requires this. – Hunter Coffman Mar 06 '20 at 00:50
  • 3
    Those sort of requirements are worth mentioning in the question. An alternative solution is to return a `std::string_view` onto a `static std::string` (add 7 characters to your example). – Indiana Kernick Mar 06 '20 at 00:53
  • If I understand the purpose of `static` correctly, that would mean that `new_str` would not get destroyed until the program ended. If I called the function multiple times and then reference each `std:string_view`, each would be viewing the result of the most recent function call, right? If I had to call the function multiple times over and wanted to preserve each individual one, is there a way to accomplish that? Thanks for your help, by the way, I really appreciate it. (I'll update post to be more clear) – Hunter Coffman Mar 06 '20 at 01:02
  • 8
    Unless you're not telling us something important, that function is fundamentally broken. – zdan Mar 06 '20 at 01:05
  • 2
    @HunterCoffman: There is no reason the code you posted has to use `string_view`. – Nicol Bolas Mar 06 '20 at 01:07
  • 1
    Why does `views` need to be an array of `std::string_view`? – Indiana Kernick Mar 06 '20 at 01:07
  • As a suggestion, Maybe you need to use "lifetime" instead of "scope"? – con ko Mar 06 '20 at 01:10
  • @HunterCoffman If you are concerning about this, then there's no need to use `std::string_view` in your code. – con ko Mar 06 '20 at 01:12
  • I appreciate everyone's feedback. It makes a lot of sense to me why it doesn't work, but it seems to be the problem is the constraints put upon me by the codebase I'm working in. I'll have to talk with the designers as to why they had to use `std::string_view`. I understand the code I posted doesn't require it, but that's because I think typing out the entire thing would be useful. What I'm actually trying to do is store a local string into an object that has an attribute that is of type `std::string_view`. I'll have to take with the designers more, but thanks again everyone. – Hunter Coffman Mar 06 '20 at 01:12
  • 1
    @HunterCoffman is this a free function or a member function in your real codebase? – Ryan Haining Mar 06 '20 at 01:14
  • This is a member function of the codebase. The class I'm trying to interface with requires a single argument in its constructor that is of type `std::string_view`, but I have a locally modified string that I was trying to use as the constructor argument. – Hunter Coffman Mar 06 '20 at 01:16
  • 4
    The information in your most recent comments is definitely worth mentioning in the question as it completely changes the sort of answers you'll likely receive. This seems like an [XY problem](https://meta.stackexchange.com/questions/66377/what-is-the-xy-problem). – Indiana Kernick Mar 06 '20 at 01:17

3 Answers3

8

You are creating an string object in a conceptually pure function (no state outside of the inputs/outputs). You then decided that this pure function should return a non-owning view of this object. This is a contradiction: a pure function cannot own state after the function has completed execution. And if the function doesn't own that state, and the return value doesn't own that state... the state doesn't exist.

You have only two options: return something that actually owns the string, or make the function non-pure by having it creating a string which outlives the function call (this could be through a static local, a global, or any number of things). The latter is almost always the wrong answer.

If some larger codebase required this operation, then that larger codebase has a design problem that needs to be resolved.

Nicol Bolas
  • 449,505
  • 63
  • 781
  • 982
  • Okay, thanks for your response, this makes a lot of sense. I'll have to discuss this with some of the designers to see why they did it this certain way. Thanks again for your input. – Hunter Coffman Mar 06 '20 at 01:09
  • 3
    @HunterCoffman: This answer is correct assuming that the function is not a member. Since you've clarified that the function _is_ a member, that's _critical_ information that changes everything. Please post a *new* question with this critical information to get a answer more applicable to what your actual needs are. – Mooing Duck Mar 06 '20 at 01:24
2

A string view isn't magical: Essentially, it's just a pointer to the start of the string, and its length. Just like you wouldn't return a pointer to a local variable, you wouldn't return an std::string_view to a local string. It's that simple.

The "solution" is the same as for pointers:

  • If the (string) object should be local - don't return a pointer (string view) to it.
  • If the (string) object lifetime should to extend the lifetime - either:
    • return by value (there's guaranteed return-value copy elision and NRVO), or
    • create a buffer through, say, an std::unique_ptr, and then move-return it, or
    • pass a buffer to work with as input, and use that for your string data.
einpoklum
  • 118,144
  • 57
  • 340
  • 684
1

Unfortunately I can't simply change the return type to std::string due to the fact that I'm working a larger codebase and required the return type to be as is.

Why is it that the "architects" who clearly have rather poor understanding of the language insist that the broken design that results from their misunderstanding shall be borne by people who actually care and want to learn? "required"... I'm just shaking my head. What a mess. The larger codebase you're working with is probably a steaming pile of poo. We can't even be sure. After all, piles of poo have some uses in farming and whatnot. Whereas such codebases mostly yield gnashing of teeth, burnout, etc... :(

This is a member function of the codebase. The class I'm trying to interface with requires a single argument in its constructor that is of type std::string_view, but I have a locally modified string that I was trying to use as the constructor argument.

There's no problem there. Taking a string_view as a constructor argument is pretty typical. And, pretty typically, outside of some specialized view classes, this constructor argument is meant to initialize a std::string member!

If the class you're talking about is storing the view, then it's broken, unless its purpose was to be a view class, i.e. a class that acts as a short-lived reference whose initializers must outlive it. For example, this would be valid:

class ConsumerView {
public:
  explicit ConsumerView( std::string_view str ) : m_view(str) {}
private:
  std::string_view m_view;
};

void test() {
  std::string str(mergeIntoSv("a", "b"));
  ConsumerView cview(str);
  ...
}

This is valid only because the owner of the viewed data - str - outlives the view cview.

Anything other use of ConsumerView would be broken by design and couldn't be made to work without terrible workarounds that will completely erase any purported gains that were envisioned by the use of string_view.

If the consumer class you have in mind is not meant as a short lived view, then it absolutely must be taking a copy of the view it was passed.

Suppose the class you're talking about looks like:

class Consumer {
  [...]
public:
  explicit Consumer(std::string_view str);
};

The only valid thing to do with this str argument is to use it to initialize some storage and keep a copy of the contents there. Typically such storage would be std::string but it could be std::vector<char> or what have you:

class Consumer {
public:
  explicit Consumer(std::string_view str) : m_str(str) {}
private:
  std::string m_str;
};

And the whole point of string_view is that you can use strings to initialize it. So, mergeIntoSV is pointless. I presume you thought you needed the signature of mergeIntoSV to be the way it is for the following purpose:

Consumer c1(mergeIntoSV(str1, str2));

You absolutely don't need that. The signature of mergeIntoSV as you show is back-asswards. It should return a std::string and accept two std::string_views:

std::string merge(std::string_view str1, std::string_view str2)
{
   std::string result;
   result.reserve( str1.size() + str2.size() );
   result.append(str1).append(str2);
   return result;
}

This is the only sensible way I know of implementing this function. It must return a container that owns the merged data, there's no other way around it. It also must reserve adequate space in the container it returns, to avoid reallocations. Otherwise you could forget about this method and simply write std::string(str1) + str2. It would potentially allocate twice instead of just once, but the outcome of the expression is the same: an rvalue that is the string that is the concatenation of str1 and str2.

At this point, the following is perfectly valid:

Consumer c1(merge(str1, str2));

There's no undefined behavior, since std::string returned by merge outlives the full expression, i.e. it will only be destroyed after c1's constructor has finished and used the view to construct an internal copy of the string in c1.

One thing you can then do to Consumer is to accept rvalue strings:

class Consumer2 {
public:
  explicit Consumer2(std::string_view str) : m_str(str) {}
  explicit Consumer2(std::string&& str) : m_str(std::move(str)) {}
private:
  std::string m_str;
};

Then, there are no copies:

Consumer2 c2(merge(str1, str2));

Since the result of merge is not named, it is an rvalue, so the Consumer2::Consumer2(std::string&&) constructor gets invoked, and if the result of merge had allocated a memory block, that block will be moved into c2.m_str, at little cost, with no memory allocations/deallocations nor large copies.

An Aside
Now, most std::string implementations provide a small-string optimization. That's when short strings can be retained within the std::string object itself, without allocating any additional memory. Thus, even when move-constructing or move-assigning those small strings, the small string data will be copied. But it's absolutely inconsequential.

In all cases I know of, std::string occupies at most one cacheline (64 bytes typically). On all modern CPUs, the cost of copying a full cacheline is "same" as the cost of copying just one byte - in fact, copying just one byte may cost a little bit more, depending on implementation details. And if that one byte's selection may involve a branch misprediction, you could copy a dozen cachelines in the time it took to do all the work of copying just one byte... So, the small string copy that std::string::string may do for small strings is basically always free.

Now, it may well be that the mergeInto function is a method of a class that can carry data. The property/attribute that mergeInto really merges into would then be a member of that class. For example:

class Carrier {
public:
  std::string_view mergeInto(std::string_view str1, std::string_view str2);
  std::string_view value() const;
private:
  std::string m_value;
};

std::string_view Carrier::mergeInto(std::string_view str1, std::string_view str2)
{
   m_value.clear();
   m_value.reserve( str1.size() + str2.size() );
   m_value.append(str1).append(str2);
   return m_value;
}

std::string_view Carrier::value() const
{
   return m_value;
}

This is valid, there's no undefined behavior: mergeInto modifies the state of the object, and returns a view of the modified property. Such view can also be obtained without modifications through value().

There is some debate whether views into an object's string-typed state should be returned via const std::string & or std::string_view. Either one acts like a reference, and must not outlive the carrier object. Typically, you'd want to copy it into a std::string if the value had to be passed out of the e function where mergeInto() or value() were invoked:

std::string carrierUser(const Carrier& carrier)
{
  auto val_ref = carrier.value();
  ...
  return val_ref;
}

It is usually a bad idea to propagate references through general-purpose functions, especially if there's nothing obvious about them that screams "this must be just propagating a reference". Say, with std::ostream& operator<<(std::ostream&, T) you know that the use case demands that the stream reference is "threaded through" the operators. With a free function, or an unrelated method, it may not be so.

It seems to me that the design of the project you're working on was concocted by someone very eager to use the newfangled std::string_view but without much understanding of what problem that class was meant to solve.

And the primary reasons for std::string_view were:

  1. Passing a "universal reference to a string-like object" into methods and functions:

    Before:

    void fun1(const char* str);
    void fun1(const std::string& str);
    

    After:

    void fun1(std::string_view str);
    

    There's no magic here: the view is still a reference that must not outlive the owner of the string data.

  2. Generating transient (temporary, short-lived) views of substrings:

    Before - naive - two temporary strings have to be allocated:

    std::string join_first_n(std::size_t count, const std::string& str1, const std::string& str2)
    {
       return str1.substr(0, count) + str2.substr(0, count);
    }
    

    Note: Some std::string implementations were "clever" and had expression template overloads designed to capture the "shape" of the concatenation expression and perform the necessary optimizations, such as pre-reserving sufficient space to fit all pieces to be concatenated, etc. But this was not always available, and it contributed to longer compile times...

    Midway - no temporaries, but inputs must be strings:

    std::string join_first_n(std::size_t count, const std::string& str1, const std::string& str2)
    {
       std::string result;
       auto size1 = std::min(count, str1.size());
       auto size2 = std::min(count, str2.size());
       result.reserve( size1 + size2 );
       result.append(str1.data(), size1).append(str2.data(), size2);
       return result;
    }
    

    std::string has to have overloads galore for methods that take strings as input. Since append's design predates the string_view, there must be, for efficiency's sake, the following two overloads:

     std::string& append(const char*);    
     std::string& append(const char*, std::size_t);
    

    The first overload has to measure the size of the input string at runtime using e.g. strlen, then pass this information to the second overload... What a mess. Both would be solved by:

     std::string& append(std::string_view);
    

    After - no temporaries, inputs can be substrings already:

    std::string join_first_n(std::size_t count, std::string_view str1, std::string_view str2)
    {
       std::string result;
       auto size1 = std::min(count, str1.size());
       auto size2 = std::min(count, str2.size());
       result.reserve( size1 + size2 );
       result.append(str1.data(), size1).append(str2.data(), size2);
       return result;
    }
    
    void test()
    {
       assert(join_first_n(2, "abcd", "e") == "abe");
    }
    

    Note that the string literals "abcd" and "efg" did not get their lengths counted at runtime, and were never copied. std::string_view captures the size of string literals at compile time :)

Kuba hasn't forgotten Monica
  • 95,931
  • 16
  • 151
  • 313