66

I'm trying to use std::string instead of char* whenever possible, but I worry I may be degrading performance too much. Is this a good way of returning strings (no error checking for brevity)?

std::string linux_settings_provider::get_home_folder() {
    return std::string(getenv("HOME"));
}

Also, a related question: when accepting strings as parameters, should I receive them as const std::string& or const char*?

Thanks.

Pedro d'Aquino
  • 5,130
  • 6
  • 36
  • 46

12 Answers12

64

Return the string.

I think the better abstraction is worth it. Until you can measure a meaningful performance difference, I'd argue that it's a micro-optimization that only exists in your imagination.

It took many years to get a good string abstraction into C++. I don't believe that Bjarne Stroustroup, so famous for his conservative "only pay for what you use" dictum, would have permitted an obvious performance killer into the language. Higher abstraction is good.

duffymo
  • 305,152
  • 44
  • 369
  • 561
  • Thanks. I was a bit afraid it was considered bad practice, but I'm glad to see it isn't :-) – Pedro d'Aquino Jun 23 '09 at 12:39
  • 3
    remember you can always use references where appropriate too to avoid unneeded copies. i try to have input parameters as "const std::string&" where possible – ShoeLace Jun 23 '09 at 13:37
  • 4
    "It took many years to get a good string abstraction into C++." IMHO it still sucks. – Johan Kotlinski Jun 23 '09 at 13:51
  • How so? Still an improvement over char *. – duffymo Jun 23 '09 at 14:30
  • 12
    I don't think that allowing the perfect to be the enemy of the good is a wise strategy. Waiting for perfect software isn't the answer. – duffymo Jun 23 '09 at 14:31
  • I sometime accept a const char* as a parameter but return std::string. It's a preference for me, and seems to make the code a little easier to work with, though there's no reason I can think of why you couldn't wrap your const char* in a std::string and pass it as a parameter. I guess it's up to you on how you want to do it (though returning a const char* is generally not wise IMO). – David Peterson Dec 25 '12 at 05:42
  • Isn't it better for member functions or static variables to return a (potentially const) reference to the `std::string` instead? – KeyC0de Apr 06 '22 at 20:31
  • A const std::string would be a good idea. 13 years later - an improvement. – duffymo Apr 06 '22 at 21:21
14

Return the string, like everyone says.

when accepting strings as parameters, should I receive them as const std::string& or const char*?

I'd say take any const parameters by reference, unless either they're lightweight enough to take by value, or in those rare cases where you need a null pointer to be a valid input meaning "none of the above". This policy isn't specific to strings.

Non-const reference parameters are debatable, because from the calling code (without a good IDE), you can't immediately see whether they're passed by value or by reference, and the difference is important. So the code may be unclear. For const params, that doesn't apply. People reading the calling code can usually just assume that it's not their problem, so they'll only occasionally need to check the signature.

In the case where you're going to take a copy of the argument in the function, your general policy should be to take the argument by value. Then you already have a copy you can use, and if you would have copied it into some specific location (like a data member) then you can move it (in C++11) or swap it (in C++03) to get it there. This gives the compiler the best opportunity to optimize cases where the caller passes a temporary object.

For string in particular, this covers the case where your function takes a std::string by value, and the caller specifies as the argument expression a string literal or a char* pointing to a nul-terminated string. If you took a const std::string& and copied it in the function, that would result in the construction of two strings.

Steve Jessop
  • 273,490
  • 39
  • 460
  • 699
12

The cost of copying strings by value varies based on the STL implementation you're working with:

  • std::string under MSVC uses the short string optimisation, so that short strings (< 16 characters iirc) don't require any memory allocation (they're stored within the std::string itself), while longer ones require a heap allocation every time the string is copied.

  • std::string under GCC uses a reference counted implementation: when constructing a std::string from a char*, a heap allocation is done every time, but when passing by value to a function, a reference count is simply incremented, avoiding the memory allocation.

In general, you're better off just forgetting about the above and returning std::strings by value, unless you're doing it thousands of times a second.

re: parameter passing, keep in mind that there's a cost from going from char*->std::string, but not from going from std::string->char*. In general, this means you're better off accepting a const reference to a std::string. However, the best justification for accepting a const std::string& as an argument is that then the callee doesn't have to have extra code for checking vs. null.

jskinner
  • 1,041
  • 8
  • 7
  • 1
    Wouldn't this mean I'm better off accepting a const char*? If my client has a std::string he can c_str() it, which, as you said, doesn't cost much. On the other hand, if he has a char*, he's forced to build a std::string. – Pedro d'Aquino Jun 23 '09 at 14:06
  • 1
    Brian: GCC most certainly does use a reference counted string implementation, have a read of /usr/include/c++/4.3/bits/basic_string.h, for example. – jskinner Jun 23 '09 at 22:32
  • Pedro: If you're writing a function that only needs a const char*, then yes, you're clearly better off accepting a const char*. If the function needs it as a std::string, then it's better off as that. My comment was more in relation to the cases where you don't know which you need (e.g., when writing an interface class). – jskinner Jun 23 '09 at 22:35
  • @Brian - RTFCode, it's plain as day. GCC still uses reference-counting. – Tom Jun 24 '09 at 01:28
  • Wow, I was totally wrong. Sorry about that. I recall reading an in-depth article about the failures of reference counted strings, and how that it is actually more efficient to go with a non-referenced counted solution. I must have dreamed it all. – Brian Neal Jun 24 '09 at 21:33
  • It looks like the implementation of std::string in gcc changed in version 5, so that it no longer uses reference counting and uses the short string optimization like MSVC. Search for std::string in https://gcc.gnu.org/gcc-5/changes.html. – Andrew Bainbridge Jul 26 '15 at 18:48
10

Seems like a good idea.

If this is not part of a realtime software (like a game) but a regular application, you should be more than fine.

Remember, "Premature optimization is the root of all evil"

plinth
  • 48,267
  • 11
  • 78
  • 120
kostia
  • 6,161
  • 3
  • 19
  • 23
6

It's human nature to worry about performance especially when programming language supports low-level optimization. What we shouldn't forget as programmers though is that program performance is just one thing among many that we can optimize and admire. In addition to program speed we can find beauty in our own performance. We can minimize our efforts while trying to achieve maximum visual output and user-interface interactiveness. Do you think that could be more motivation that worrying about bits and cycles in a long run... So yes, return string:s. They minimize your code size, and your efforts, and make the amount of work you put in less depressing.

AareP
  • 2,355
  • 3
  • 21
  • 28
5

In your case Return Value Optimization will take place so std::string will not be copied.

Kirill V. Lyadvinsky
  • 97,037
  • 24
  • 136
  • 212
  • 1
    That's not true. std::string is going to dynamically allocate a buffer and copy the entire string, and return value optimization will not do a lick here. However, he should still use std::string. After checking that getenv() didn't return NULL, that is! – Tom Jun 23 '09 at 12:40
  • 1
    One allocation will be really. I mean, that would not be copied string itself. – Kirill V. Lyadvinsky Jun 23 '09 at 12:47
  • 3
    +1: You're correct. Without the RVO, it would have to allocate two buffers and copy between them. – James Hopkin Jun 23 '09 at 14:52
4

Beware when you cross module boundaries.

Then it's best to return primitive types since C++ types are not necessarily binary compatible across even different versions of the same compiler.

Hans Malherbe
  • 2,988
  • 24
  • 19
  • 7
    You need to do much more than just avoid C++ return types for that... you need to completely pimplize *all* C++ code to really be safe, at which point you're going to be creating a C wrapper on top of your existing codebase anyways, due to the nature of class declarations. – Tom Jun 24 '09 at 01:22
3

I agree with the other posters, that you should use string.

But know, that depending on how aggressively your compiler optimizes temporaries, you will probably have some extra overhead (over using a dynamic array of chars). (Note: The good news is that in C++0a, the judicious use of rvalue references will not require compiler optimizations to buy efficiency here - and programmers will be able to make some additional performance guarantees about their code without relying on the quality of the compiler.)

In your situation, is the extra overhead worth introducing manual memory management? Most reasonable programmers would disagree - but if your application does end up having performance issues, the next step would be to profile your application - thus, if you do introduce complexity, you only do it once you have good evidence that it is needed to improve overall efficiency.

Someone mentioned that Return Value optimization (RVO) is irrelevant here - I disagree.

The standard text (C++03) on this reads (12.2):

[Begin Standard Quote]

Temporaries of class type are created in various contexts: binding an rvalue to a reference (8.5.3), returning an rvalue (6.6.3), a conversion that creates an rvalue (4.1, 5.2.9, 5.2.11, 5.4), throwing an exception (15.1), entering a handler (15.3), and in some initializations (8.5). [Note: the lifetime of exception objects is described in 15.1. ] Even when the creation of the temporary object is avoided (12.8), all the semantic restrictions must be respected as if the temporary object was created. [Example: even if the copy constructor is not called, all the semantic restrictions, such as accessibility (clause 11), shall be satisfied. ]

 [Example:  
struct X {
  X(int);
  X(const X&);
  ˜X();
};

X f(X);

void g()
{
  X a(1);
  X b = f(X(2));
  a = f(a);
}

Here, an implementation might use a temporary in which to construct X(2) before passing it to f() using X’s copy-constructor; alternatively, X(2) might be constructed in the space used to hold the argument. Also, a temporary might be used to hold the result of f(X(2)) before copying it to b using X’s copyconstructor; alternatively, f()’s result might be constructed in b. On the other hand, the expression a=f(a) requires a temporary for either the argument a or the result of f(a) to avoid undesired aliasing of a. ]

[End Standard Quote]

Essentially, the text above says that you can possibly rely on RVO in initialization situations, but not in assignment situations. The reason is, when you are initializing an object, there is no way that what you are initializing it with could ever be aliased to the object itself (which is why you never do a self check in a copy constructor), but when you do an assignment, it could.

There is nothing about your code, that inherently prohibits RVO - but read your compiler documentation to ensure that you can truly rely on it, if you do indeed need it.

Faisal Vali
  • 32,723
  • 8
  • 42
  • 45
1

I agree with duffymo. You should make an understandable working application first and then, if there is a need, attack optimization. It is at this point that you will have an idea where the major bottlenecks are and will be able to more efficiently manage your time in making a faster app.

Brian
  • 2,253
  • 2
  • 23
  • 39
1

I agree with @duffymo. Don't optimize until you have measured, this holds double true when doing micro-optimizations. And always: measure before and after you've optimized, to see if you actually changed things to the better.

JesperE
  • 63,317
  • 21
  • 138
  • 197
1

Return the string, it's not that big of a loss in term of performance but it will surely ease your job afterward.

Plus, you could always inline the function but most optimizer will fix it anyways.

Gab Royer
  • 9,587
  • 8
  • 40
  • 58
1

If you pass a referenced string and you work on that string you don't need to return anything. ;)

Partial
  • 9,529
  • 12
  • 42
  • 57