0

Something that follows this interface:

class StreamTokenizer
{
public:
    StreamTokenizer(const std::string delimiter);
    std::list<std::string> add_data(const std::string);
    std::string get_left_over();
};
StreamTokenizer d(" ");
std::list<std::string> tokens;
tokens.append(d.add_data("tok"));
tokens.append(d.add_data("1 t"));
tokens.append(d.add_data("ok2 tok3"));
tokens.push_back(d.get_left_over());
// tokens = {tok1, tok2, tok3}
// d = {}

It receives data in chunks, it should return all the tokens it has found so far, it should be able to concatenate leftovers to next chunk, and it should not hold data that was already tokenized.

please do not suggest using stringstream unless you can show how to erase already tokenized data from it (my stream is virtually infinite)

Ezra
  • 1,401
  • 5
  • 15
  • 33

1 Answers1

5

Yes, it's called "the standard library."

What you're asking for seems to fall within the range that streams can handle pretty easily.

std::stringstream d;

d << "tok";
d << "1 t";
d << "ok2 tok3";

std::vector<std::string> tokens((std::istream_iterator<std::string>(d)),
                                 std::istream_iterator<std::string>());

for (std::string s : tokens)
    std::cout << s << "\n";

Result:

tok1
tok2
tok3

I haven't shown a "get the rest" function here. I suppose istream::read would probably be the obvious choice.

I suppose I should add: by default, strings will be broken at anything the stream interprets as white-space, not just the space character. You can change what it interprets as white-space by writing a custom ctype facet and imbuing the stream with that facet. For example, I showed how to do that for - and / in a previous answer.

Community
  • 1
  • 1
Jerry Coffin
  • 476,176
  • 80
  • 629
  • 1,111
  • Thanks. But I find it insufficient for two reasons: 1. it does not clean the stream. 2. By calling tokens() I get all the tokens that has ever existed on the stream. – Ezra Jun 06 '13 at 19:27
  • 2
    @Ezra: What do you mean by "clean the stream"? `tokens` is like any other vector (or other container of your choice). You can remove any or all items from it as needed. Bottom line: it does what you ask for in your question. If you want something else/more, you probably need to edit you question to explain what you really want. – Jerry Coffin Jun 06 '13 at 19:30
  • stringstream buffer contains all the data you have ever inserted into it and calling tokens does not change that. This is unacceptable for my situation. My program should tokenize the stream continuously immediately upon inserting data into it. – Ezra Jun 06 '13 at 19:35
  • @Ezra: That would certainly be different -- from the standard library, or any alternative of which I'm aware. At least AFAIK, if you really need that, you'll probably have to do it on your own. – Jerry Coffin Jun 06 '13 at 20:05