2

I have this code:

#include <boost/tokenizer.hpp>

typedef boost::tokenizer<boost::char_separator<char> > tokenizer;

int main() {
    using namespace std;
    boost::char_separator<char> sep(",");

    string s1 = "hello, world";
    tokenizer tok1(s1, sep);
    for (auto& token : tok1) {
        cout << token << " ";
    }
    cout << endl;

    tokenizer tok2(string("hello, world"), sep);
    for (auto& token : tok2) {
        cout << token << " ";
    }
    cout << endl;

    tokenizer tok3(string("hello, world, !!"), sep);
    for (auto& token : tok3) {
        cout << token << " ";
    }
    cout << endl;

    return 0;
}

This code produces the following result:

hello  world 
hello  
hello  world  !!

Obviously, the second line is wrong. I was expecting hello world instead. What is the problem?

r.v
  • 4,697
  • 6
  • 35
  • 57
  • I don't know exactly how the tokeniser works, but I expect this is a scope issue. The tokeniser probably only keeps a reference to the string, but the string goes out of scope, and is therefore garbage memory when you print it. – Dave Jun 09 '13 at 16:52

1 Answers1

5

The tokenizer does not create a copy of the string you pass as the first argument to its constructor, nor does it compute all the tokens upon construction and then cache them. Token extraction is performed in a lazy way, on demand.

However, in order for that to be possible, the object on which the token extraction is performed must stay alive as long as token are being extracted.

Here, the object from which tokens are to be extracted goes out of scope when the initialization of tok2 terminates (the same applies to tok3). This means you will get undefined behavior when the tokenizer object will try to use iterators into that string.

Notice, that tok3 is giving you the expected output purely by chance. The expected output is indeed one of the possible outputs of a program with undefined behavior.

Andy Prowl
  • 124,023
  • 23
  • 387
  • 451
  • thanks! Should the compiler not complain then that I am passing an rvalue? – r.v Jun 09 '13 at 17:00
  • 2
    @r.v: Nope, the compiler does not have to figure out whether your program will end up dereferencing invalid iterators or dangling references/pointer - in the general case, this is impossible – Andy Prowl Jun 09 '13 at 17:03