Unfortunately I can't simply change the return type to std::string due to the fact that I'm working a larger codebase and required the return type to be as is.
Why is it that the "architects" who clearly have rather poor understanding of the language insist that the broken design that results from their misunderstanding shall be borne by people who actually care and want to learn? "required"... I'm just shaking my head. What a mess. The larger codebase you're working with is probably a steaming pile of poo. We can't even be sure. After all, piles of poo have some uses in farming and whatnot. Whereas such codebases mostly yield gnashing of teeth, burnout, etc... :(
This is a member function of the codebase. The class I'm trying to interface with requires a single argument in its constructor that is of type std::string_view
, but I have a locally modified string that I was trying to use as the constructor argument.
There's no problem there. Taking a string_view
as a constructor argument is pretty typical. And, pretty typically, outside of some specialized view classes, this constructor argument is meant to initialize a std::string
member!
If the class you're talking about is storing the view, then it's broken, unless its purpose was to be a view class, i.e. a class that acts as a short-lived reference whose initializers must outlive it. For example, this would be valid:
class ConsumerView {
public:
explicit ConsumerView( std::string_view str ) : m_view(str) {}
private:
std::string_view m_view;
};
void test() {
std::string str(mergeIntoSv("a", "b"));
ConsumerView cview(str);
...
}
This is valid only because the owner of the viewed data - str
- outlives the view cview
.
Anything other use of ConsumerView
would be broken by design and couldn't be made to work without terrible workarounds that will completely erase any purported gains that were envisioned by the use of string_view
.
If the consumer class you have in mind is not meant as a short lived view, then it absolutely must be taking a copy of the view it was passed.
Suppose the class you're talking about looks like:
class Consumer {
[...]
public:
explicit Consumer(std::string_view str);
};
The only valid thing to do with this str
argument is to use it to initialize some storage and keep a copy of the contents there. Typically such storage would be std::string
but it could be std::vector<char>
or what have you:
class Consumer {
public:
explicit Consumer(std::string_view str) : m_str(str) {}
private:
std::string m_str;
};
And the whole point of string_view
is that you can use strings to initialize it. So, mergeIntoSV
is pointless. I presume you thought you needed the signature of mergeIntoSV
to be the way it is for the following purpose:
Consumer c1(mergeIntoSV(str1, str2));
You absolutely don't need that. The signature of mergeIntoSV
as you show is back-asswards. It should return a std::string
and accept two std::string_view
s:
std::string merge(std::string_view str1, std::string_view str2)
{
std::string result;
result.reserve( str1.size() + str2.size() );
result.append(str1).append(str2);
return result;
}
This is the only sensible way I know of implementing this function. It must return a container that owns the merged data, there's no other way around it. It also must reserve adequate space in the container it returns, to avoid reallocations. Otherwise you could forget about this method and simply write std::string(str1) + str2
. It would potentially allocate twice instead of just once, but the outcome of the expression is the same: an rvalue that is the string that is the concatenation of str1 and str2.
At this point, the following is perfectly valid:
Consumer c1(merge(str1, str2));
There's no undefined behavior, since std::string
returned by merge
outlives the full expression, i.e. it will only be destroyed after c1
's constructor has finished and used the view to construct an internal copy of the string in c1
.
One thing you can then do to Consumer
is to accept rvalue strings:
class Consumer2 {
public:
explicit Consumer2(std::string_view str) : m_str(str) {}
explicit Consumer2(std::string&& str) : m_str(std::move(str)) {}
private:
std::string m_str;
};
Then, there are no copies:
Consumer2 c2(merge(str1, str2));
Since the result of merge
is not named, it is an rvalue, so the Consumer2::Consumer2(std::string&&)
constructor gets invoked, and if the result of merge
had allocated a memory block, that block will be moved into c2.m_str
, at little cost, with no memory allocations/deallocations nor large copies.
An Aside
Now, most std::string
implementations provide a small-string optimization. That's when short strings can be retained within the std::string
object itself, without allocating any additional memory. Thus, even when move-constructing or move-assigning those small strings, the small string data will be copied. But it's absolutely inconsequential.
In all cases I know of, std::string
occupies at most one cacheline (64 bytes typically). On all modern CPUs, the cost of copying a full cacheline is "same" as the cost of copying just one byte - in fact, copying just one byte may cost a little bit more, depending on implementation details. And if that one byte's selection may involve a branch misprediction, you could copy a dozen cachelines in the time it took to do all the work of copying just one byte... So, the small string copy that std::string::string
may do for small strings is basically always free.
Now, it may well be that the mergeInto
function is a method of a class that can carry data. The property/attribute that mergeInto
really merges into would then be a member of that class. For example:
class Carrier {
public:
std::string_view mergeInto(std::string_view str1, std::string_view str2);
std::string_view value() const;
private:
std::string m_value;
};
std::string_view Carrier::mergeInto(std::string_view str1, std::string_view str2)
{
m_value.clear();
m_value.reserve( str1.size() + str2.size() );
m_value.append(str1).append(str2);
return m_value;
}
std::string_view Carrier::value() const
{
return m_value;
}
This is valid, there's no undefined behavior: mergeInto
modifies the state of the object, and returns a view of the modified property. Such view can also be obtained without modifications through value()
.
There is some debate whether views into an object's string-typed state should be returned via const std::string &
or std::string_view
. Either one acts like a reference, and must not outlive the carrier object. Typically, you'd want to copy it into a std::string
if the value had to be passed out of the e function where mergeInto()
or value()
were invoked:
std::string carrierUser(const Carrier& carrier)
{
auto val_ref = carrier.value();
...
return val_ref;
}
It is usually a bad idea to propagate references through general-purpose functions, especially if there's nothing obvious about them that screams "this must be just propagating a reference". Say, with std::ostream& operator<<(std::ostream&, T)
you know that the use case demands that the stream reference is "threaded through" the operators. With a free function, or an unrelated method, it may not be so.
It seems to me that the design of the project you're working on was concocted by someone very eager to use the newfangled std::string_view
but without much understanding of what problem that class was meant to solve.
And the primary reasons for std::string_view
were:
Passing a "universal reference to a string-like object" into methods and functions:
Before:
void fun1(const char* str);
void fun1(const std::string& str);
After:
void fun1(std::string_view str);
There's no magic here: the view is still a reference that must not outlive the owner of the string data.
Generating transient (temporary, short-lived) views of substrings:
Before - naive - two temporary strings have to be allocated:
std::string join_first_n(std::size_t count, const std::string& str1, const std::string& str2)
{
return str1.substr(0, count) + str2.substr(0, count);
}
Note: Some std::string
implementations were "clever" and had expression template overloads designed to capture the "shape" of the concatenation expression and perform the necessary optimizations, such as pre-reserving sufficient space to fit all pieces to be concatenated, etc. But this was not always available, and it contributed to longer compile times...
Midway - no temporaries, but inputs must be strings:
std::string join_first_n(std::size_t count, const std::string& str1, const std::string& str2)
{
std::string result;
auto size1 = std::min(count, str1.size());
auto size2 = std::min(count, str2.size());
result.reserve( size1 + size2 );
result.append(str1.data(), size1).append(str2.data(), size2);
return result;
}
std::string
has to have overloads galore for methods that take strings as input. Since append
's design predates the string_view
, there must be, for efficiency's sake, the following two overloads:
std::string& append(const char*);
std::string& append(const char*, std::size_t);
The first overload has to measure the size of the input string at runtime using e.g. strlen
, then pass this information to the second overload... What a mess. Both would be solved by:
std::string& append(std::string_view);
After - no temporaries, inputs can be substrings already:
std::string join_first_n(std::size_t count, std::string_view str1, std::string_view str2)
{
std::string result;
auto size1 = std::min(count, str1.size());
auto size2 = std::min(count, str2.size());
result.reserve( size1 + size2 );
result.append(str1.data(), size1).append(str2.data(), size2);
return result;
}
void test()
{
assert(join_first_n(2, "abcd", "e") == "abe");
}
Note that the string literals "abcd" and "efg" did not get their lengths counted at runtime, and were never copied. std::string_view
captures the size of string literals at compile time :)