30

Initializing std::string from a NULL char pointer is undefined behaviour, I believe. So, here are alternative versions of a constructor, where mStdString is a member variable of type std::string:

void MyClass::MyClass(const char *cstr) :
    mStdString( cstr ? cstr : "")
{}

void MyClass::MyClass(const char *cstr) :
    mStdString(cstr ? std::string(cstr) : std::string())
{}

void MyClass::MyClass(const char *cstr)
{
    if (cstr) mStdString = cstr;
    // else keep default-constructed mStdString
}

Edit, constructor declaration inside class MyClass:

MyClass(const char *cstr = NULL);

Which of these, or possibly something else, is the best or most proper way to initialize std::string from a possibly NULL pointer, and why? Is it different for different C++ standards? Assume normal release build optimization flags.

I'm looking for an answer with explanation of why a way is the right way, or an answer with a reference link (this also applies if answer is "doesn't matter"), not just personal opinions (but if you must, at least make it just a comment).

hyde
  • 60,639
  • 21
  • 115
  • 176
  • That all depends on what you want a nullptr to mean. Is it the same as an empty string for you, or should it have a different meaning? – PlasmaHH Jul 04 '13 at 07:45
  • I think you should use the first option because you call the constructor of the string just once. – Alexis Jul 04 '13 at 07:45
  • 1
    @PlasmaHH Well, I want it to do what the code snippets do, when char pointer is NULL. – hyde Jul 04 '13 at 07:47
  • 5
    Off-topic: There's a reason why `std::string` doesn't accept null pointers. I wonder if it's a good idea to casually treat null pointers and empty strings the same. I personally would make such a thing explicit in the calling code, rather than hide it in some constructor. – Kerrek SB Jul 04 '13 at 07:54
  • 1
    @KerrekSB Not so off-topic. It may be the most relevant bit of information here. I suggest that you add it to your answer. – Daniel Daranas Jul 04 '13 at 07:55
  • 1
    @DanielDaranas: The OP has explicitly requested the `// no-lint` flag on this question and doesn't want a conceptual critique :-S (In other words, I can only load the gun for you and aim it at your foot, but you have to pull the trigger yourself.) – Kerrek SB Jul 04 '13 at 07:58
  • @KerrekSB Well, in the actual code which inspired this question, NULL means "string not used", and that is actually default argument value (edited question). One alternative would be to make two constructors, but that would be more code and less robust (accidentally passing NULL to this constructor). – hyde Jul 04 '13 at 08:01
  • @hyde: Yeah, I figured that... Somehow this always irks me. If you have something which may be *optionally* available, then coercing a default value removes the distinction between "defaulted to empty" and "explicitly set to empty". I would prefer an abstraction like `class key_value_item`, which has a boolean test `is_set()`, and a member `get_value()` which returns the empty string, but the empty string is produced dynamically as `is_set() ? value_ : default_value`. – Kerrek SB Jul 04 '13 at 08:14
  • 3
    @KerrekSB I usually don't care what the OP wants or says, beyond the question itself. Once posted, this is a StackOverflow question. It belongs to the StackOverflow community, not to him. – Daniel Daranas Jul 04 '13 at 08:46

4 Answers4

20

The last one is silly because it doesn't use initialization when it could.

The first two are completely identical semantically (think of the c_str() member function), so prefer the first version because it is the most direct and idiomatic, and easiest to read.

(There would be a semantic difference if std::string had a constexpr default constructor, but it doesn't. Still, it's possible that std::string() is different from std::string(""), but I don't know any implementations that do this, since it doesn't seem to make a lot of sense. On the other hand, popular small-string optimizations nowadays mean that both versions will probably not perform any dynamic allocation.)


Update: As @Jonathan points out, the two string constructors will probably execute different code, and if that matters to you (though it really shouldn't), you might consider a fourth version:

: cstr ? cstr : std::string()

Both readable and default-constructing.


Second update: But prefer cstr ? cstr : "". As you can see below, when both branches call the same constructor, this can be implemented very efficiently using conditional moves and no branches. (So the two versions do indeed generate different code, but the first one is better.)


For giggles, I've run both versions through Clang 3.3, with -O3, on x86_64, for a struct foo; like yours and a function foo bar(char const * p) { return p; }:

Default constructor (std::string()):

    .cfi_offset r14, -16
    mov     R14, RSI
    mov     RBX, RDI
    test    R14, R14
    je      .LBB0_2
    mov     RDI, R14
    call    strlen
    mov     RDI, RBX
    mov     RSI, R14
    mov     RDX, RAX
    call    _ZNSt3__112basic_stringIcNS_11char_traitsIcEENS_9allocatorIcEEE6__initEPKcm
    jmp     .LBB0_3
.LBB0_2:
    xorps   XMM0, XMM0
    movups  XMMWORD PTR [RBX], XMM0
    mov     QWORD PTR [RBX + 16], 0
.LBB0_3:
    mov     RAX, RBX
    add     RSP, 8
    pop     RBX
    pop     R14
    ret

Empty-string constructor (""):

    .cfi_offset r14, -16
    mov     R14, RDI
    mov     EBX, .L.str
    test    RSI, RSI
    cmovne  RBX, RSI
    mov     RDI, RBX
    call    strlen
    mov     RDI, R14
    mov     RSI, RBX
    mov     RDX, RAX
    call    _ZNSt3__112basic_stringIcNS_11char_traitsIcEENS_9allocatorIcEEE6__initEPKcm
    mov     RAX, R14
    add     RSP, 8
    pop     RBX
    pop     R14
    ret

.L.str:
    .zero    1
    .size    .L.str, 1

In my case, it would even appear that "" generates better code: Both versions call strlen, but the empty-string version doesn't use any jumps, only conditional moves (since the same constructor is called, just with two different arguments). Of course that's a completely meaningless, non-portable and non-transferable observation, but it just goes to show that the compiler doesn't always need as much help as you might think. Just write the code that looks best.

David G
  • 94,763
  • 41
  • 167
  • 253
Kerrek SB
  • 464,522
  • 92
  • 875
  • 1,084
  • 2
    The first two are not identical, even IMHO semantically, there is a call to `strlen` in one of them, which the compiler can probably replace at compile-time, but you still call different code paths for `std::string()` and `std::string("", 0)`. Since you know the contents are empty calling the default constructor makes more sense than passing, counting and copying a zero-length `char` array – Jonathan Wakely Jul 04 '13 at 08:14
  • Well, some widely used C++ string classes (at least Qt `QString`) do expose concept of "null string" distinguishable from "empty string", which is part of the motivation why I asked this question. – hyde Jul 04 '13 at 08:14
  • 3
    @JonathanWakely: Right, not identical in code, but semantically... (and `strlen` on an empty string is still pretty good). Hey, what about `cstr ? cstr : string()`? – Kerrek SB Jul 04 '13 at 08:16
  • The default constructor is non-throwing. Not important in this case, since the other branch of the conditional expression could still throw, but in general it's a semantic difference. Your edit shows my preferred form though, so +1 – Jonathan Wakely Jul 04 '13 at 08:20
  • @JonathanWakely: Good point about exceptions, but indeed, *one* of the two branches can always throw. – Kerrek SB Jul 04 '13 at 08:23
  • I'm trying to create some assembly for comparison, though I'm thwarted by GCC's reference-counting strings. I'll use Clang instead for a more appropriate demonstration. – Kerrek SB Jul 04 '13 at 08:24
  • Actually I just checked the standard and the `std::string` default constructor is not `noexcept` ... I thought it was – Jonathan Wakely Jul 04 '13 at 08:30
  • 2
    @JonathanWakely: Also I just discovered that the simple `cstr ? cstr : ""` can be implemented very neatly with conditional moves, so using the *same* constructor is actually a boon. – Kerrek SB Jul 04 '13 at 08:36
  • 1
    Ah, nice! That hadn't occurred to me – Jonathan Wakely Jul 04 '13 at 08:47
  • Will `cstr ? cstr : string()` even compile (your fourth version)? Here ternary conditional returns different types in different branches, that's illegal. – Ruslan Dec 22 '15 at 16:24
  • @Ruslan: What about `double x = c ? 1.0L : 1.0f` -- illegal? – Kerrek SB Dec 22 '15 at 16:25
  • Ah, sorry, I remembered the rules wrongly. Completely forgot that they can be converted to a common type. – Ruslan Dec 22 '15 at 16:29
7

First thing first, you're right, from http://www.cplusplus.com/reference/string/string/string/:

If s is a null pointer, if n == npos, or if the range specified by [first,last) is not valid, it causes undefined behavior.

Also, it depends on what a NULL pointer means for you. I assume its the same than an empty string for you.

I would go with the first one, because its the one I read best. First solution and second are the same. Third one wouldn't work if your string was const.

jogojapan
  • 68,383
  • 11
  • 101
  • 131
Xaqq
  • 4,308
  • 2
  • 25
  • 38
1

Assuming you're happy with cstr == NULL yielding an empty mStdString, I think the first one is probably the best.

If nothing else, the third option you provide doesn't work if mStdString is const. The middle option benefits from "move semantics" under C++11, but is less obviously optimal or reasonable.

So, my vote goes with the first option.

Joe Z
  • 17,413
  • 3
  • 28
  • 39
  • "Assuming reasonable optimizations" we can probably assume that no actual moves are needed and that any copies are elided. – Kerrek SB Jul 04 '13 at 07:50
0

Whilst this may not REALLY be an answer (especially as you formulated the question) - but it's too long to fit as a comment and has code in it which doesn't ork in comments. I fully expect to get downvoted and have to delete this post - but I feel compelled to say something.

WHY would the initialization char * be NULL - and if so, couldn't you push it to the caller to make some sense of that situation - such as passing an empty string, or "unknown" or "(null)" as appropriate.

In other words, something like this:

void MyClass::MyClass(const char *cstr) 
{ 
    assert(cstr != NULL);   // or "throw cstr_must_not_be_null;" or some such. 
    mStdString = cstr;
}

(There is probably some clever way to do this in an initializer list, but I can't be bothered to figure out how to correctly do that).

I for one is not keen on NULL as an input to a string parameter in any other way than "This really doesn't exist" - and if that's what you are actually trying to replicate, then you should have a boolean to say "doesn't exist", or a pointer to a std::string that can be NULL if no string is present.

Mats Petersson
  • 126,704
  • 14
  • 140
  • 227
  • For the sake of argument, consider that you have a class where you want the default value of some parameter to be the result of a virtual member function call. You can't actually set the default by invoking the method (as you have no object to call it on), and passing in any fixed value (even an empty string) would diminish the possible valid values that the user could supply. That is, if you used the empty string to indicate the default value, then the empty string could not also be used as a valid value itself. Using the nullptr in this case seems appropriately justified, IMHO. – monkey0506 Jul 19 '15 at 05:32
  • I said a virtual member function, but please pardon the "virtual". I understand the implications of calling a virtual function from inside a constructor in the same class hierarchy. For the purposes of my comment, non-virtual member functions are sufficient. – monkey0506 Jul 19 '15 at 05:45