71

I tracked down an obscure logging bug to the fact that initializer lists of length 2 appear to be a special case! How is this possible?

The code was compiled with Apple LLVM version 5.1 (clang-503.0.40), using CXXFLAGS=-std=c++11 -stdlib=libc++.

#include <stdio.h>

#include <string>
#include <vector>

using namespace std;

typedef vector<string> Strings;

void print(string const& s) {
    printf(s.c_str());
    printf("\n");
}

void print(Strings const& ss, string const& name) {
    print("Test " + name);
    print("Number of strings: " + to_string(ss.size()));
    for (auto& s: ss) {
        auto t = "length = " + to_string(s.size()) + ": " + s;
        print(t);
    }
    print("\n");
}

void test() {
    Strings a{{"hello"}};                  print(a, "a");
    Strings b{{"hello", "there"}};         print(b, "b");
    Strings c{{"hello", "there", "kids"}}; print(c, "c");

    Strings A{"hello"};                    print(A, "A");
    Strings B{"hello", "there"};           print(B, "B");
    Strings C{"hello", "there", "kids"};   print(C, "C");
}

int main() {
    test();
}

Output:

Test a
Number of strings: 1
length = 5: hello

Test b
Number of strings: 1
length = 8: hello

Test c
Number of strings: 3
length = 5: hello
length = 5: there
length = 4: kids

Test A
Number of strings: 1
length = 5: hello

Test B
Number of strings: 2
length = 5: hello
length = 5: there

Test C
Number of strings: 3
length = 5: hello
length = 5: there
length = 4: kids

I should also add that the length of the bogus string in test b seems to be indeterminate - it's always greater than the first initializer string but has varied from one more than the length of the first string to the total of the lengths of the two strings in the initializer.

Filip Roséen - refp
  • 62,493
  • 20
  • 150
  • 196
Tom Swirly
  • 2,740
  • 1
  • 28
  • 44
  • 5
    Why the double braces? – chris Jun 09 '14 at 00:35
  • Sorry, I should have made it clear that the problem only occurs with the double braces. The single braces are "correct" - but why do I get the inconsistency when I have double braces? I extended the example to include the correct initialization using single braces for comparison. – Tom Swirly Jun 09 '14 at 00:42
  • The double braces should be valid because the outer braces cause a regular constructor lookup, and the inner braces match the `std::initializer_list` constructor parameter. But, it's definitely weird. – Potatoswatter Jun 09 '14 at 00:42
  • @Potatoswatter, That's what I'm thinking, but yeah. – chris Jun 09 '14 at 00:43
  • Wait, is the length in test `b` 6 or 8? The revision changed it. – Potatoswatter Jun 09 '14 at 00:45
  • Any reason why you're using cstdio instead of iostream? – Rubens Jun 09 '14 at 00:45
  • I added a comment to point out that the result of the length of the string in test b seems to be variable! though apparently not on a given compilation. I'm using stdio because in the original code, I call some library routine in a library I factored out that looks like stdio. I'd imagine I get the same results with C++ IO. – Tom Swirly Jun 09 '14 at 00:47
  • @TomSwirly Indeed. And yes, I've got the same results compiling with `g++ (GCC) 4.9.0`. – Rubens Jun 09 '14 at 00:50
  • 2
    i would investigate interaction with vector constructors, especially the iterator and iterator one – Cheers and hth. - Alf Jun 09 '14 at 00:50
  • 4
    Got it. Let me form an answer – chris Jun 09 '14 at 00:52
  • 1
    It crashes with VIsual C++, which is evidence of UB at work, which is evidence of constructor interaction. – Cheers and hth. - Alf Jun 09 '14 at 00:52
  • 1
    What's even weirder is that the program throws an exception when you instantiate a `Strings` in main but it goes away when you comment out the `print()` calls in `test()`. I think there's some UB going on. -- http://coliru.stacked-crooked.com/a/bf9b59160c6f46b0 – David G Jun 09 '14 at 00:55
  • Filed a [defect report](https://groups.google.com/a/isocpp.org/forum/#!topic/std-discussion/q980Ys5_Hm0). – Potatoswatter Jun 09 '14 at 02:00
  • @Potatoswatter, Interesting. I'm pretty sure I did test it with two different-sized arrays at one point as well. – chris Jun 09 '14 at 02:45
  • related but not a dup: http://stackoverflow.com/q/19847960/819272 – TemplateRex Jun 09 '14 at 09:15

2 Answers2

77

Introduction

Imagine the following declaration, and usage:

struct A {
  A (std::initializer_list<std::string>);
};

A {{"a"          }}; // (A), initialization of 1 string
A {{"a", "b"     }}; // (B), initialization of 1 string << !!
A {{"a", "b", "c"}}; // (C), initialization of 3 strings

In (A) and (C), each c-style string is causing the initialization of one (1) std::string, but, as you have stated in your question, (B) differs.

The compiler sees that it's possible to construct a std::string using a begin- and end-iterator, and upon parsing statement (B) it will prefer such construct over using "a" and "b" as individual initializers for two elements.

A { std::string { "a", "b" } }; // the compiler's interpretation of (B)

Note: The type of "a" and "b" is char const[2], a type which can implicitly decay into a char const*, a pointer-type which is suitable to act like an iterator denoting either begin or end when creating a std::string.. but we must be careful: we are causing undefined-behavior since there is no (guaranteed) relation between the two pointers upon invoking said constructor.


Explanation

When you invoke a constructor taking an std::initializer_list using double braces {{ a, b, ... }}, there are two possible interpretations:

  1. The outer braces refer to the constructor itself, the inner braces denotes the elements to take part in the std::initializer_list, or:

  2. The outer braces refer to the std::initializer_list, whereas the inner braces denotes the initialization of an element inside it.

It's prefered to do 2) whenever that is possible, and since std::string has a constructor taking two iterators, it is the one being called when you have std::vector<std::string> {{ "hello", "there" }}.

Further example:

std::vector<std::string> {{"this", "is"}, {"stackoverflow"}}.size (); // yields 2

Solution

Don't use double braces for such initialization.

Community
  • 1
  • 1
Filip Roséen - refp
  • 62,493
  • 20
  • 150
  • 196
20

First of all, this is undefined behaviour unless I'm missing something obvious. Now let me explain. The vector is being constructed from an initializer list of strings. However this list only contains one string. This string is formed by the inner {"Hello", "there"}. How? With the iterator constructor. Essentially, for (auto it = "Hello"; it != "there"; ++it) is forming a string containing Hello\0.

For a simple example, see here. While UB is reason enough, it would seem the second literal is being placed right after the first in memory. As a bonus, do "Hello", "Hello" and you'll probably get a string of length 0. If you don't understand anything in here, I recommend reading Filip's excellent answer.

Community
  • 1
  • 1
chris
  • 60,560
  • 13
  • 143
  • 205
  • … and if the compiler decides to put `"there"` at a lower address than `"Hello"`, you get a crash. – Potatoswatter Jun 09 '14 at 00:56
  • 3
    Hah! It had to be undefined behavior. But wait, why isn't an infinite loop? Answer: because due to a whim of the compiler, the two strings were laid out more or less contiguously in memory! – Tom Swirly Jun 09 '14 at 00:57
  • @Potatoswatter, Yeah, this is a really interesting occurrence. I noticed doing `"Hello", "Hello"` gave a string of length 0. – chris Jun 09 '14 at 00:57
  • I'm off for food now. I'm not going to mark it correct until I get back just to encourage you to edit it but, well, I'm pretty sure you're right... :-) – Tom Swirly Jun 09 '14 at 00:57
  • @chris: depends on compiler settings, they might have a length of zero, or any other length – Mooing Duck Jun 09 '14 at 17:04
  • @MooingDuck, Of course. The logical thing to do there is reuse the array, but the whole thing is shaky at best. My answer is more explicit on that than the comment. – chris Jun 09 '14 at 17:31
  • @chris 0 from reuse and 8 from word alignment are explainable and not that interesting. Getting 4 for "Hello" and "o" would be cool (though unlikely) but the most interesting thing and part of what makes C++ so fun is the answer itself. – Nick Jun 09 '14 at 21:33