0

I would like to parse a string using whitespace as delimiters. I know it can be done using 3 methods from the string library, assign, substring, and find . Basically I want to assign my target string the substring from the beginning of the string to the first occurrences of whitespace.

It would be something like this

while(!line.empty())
{

    line.assign(line.substr(0, line.find(" ")
    line.erase(0, line.find(" ");

}

My question is, would this be a good way to go about parsing a string?

Dietmar Kühl
  • 150,225
  • 13
  • 225
  • 380
Steffan Harris
  • 9,106
  • 30
  • 75
  • 101
  • Unless I'm reading this wrong, you might want to add an extra `)` or two on each of those `line.` lines. :) – summea Feb 25 '12 at 06:21
  • 1
    This is heavily commented on here: http://stackoverflow.com/questions/53849/how-do-i-tokenize-a-string-in-c – Timeout Feb 25 '12 at 06:21

1 Answers1

2

You'd need a bit more than just one line variable: as is your code won't terminate if the string contains a non-space character. Of course, this assumes that it compiles in the first place as there are a number of parenthesis and a semicolon missing (also, your question doesn't match your code: there is no function substring; if you want to program effectively, you have to be very precise about everything: computer take your statements very literal). You probably want to chop your line up into its bits and put them into some sort of container.

Once over these trivial aspect, you might want to consider using ' ' instead of " " because in this case the compiler and library can assume that you want to look for just one character rather than for a sequence of characters. This can be quite a bit faster. However, this is actually not whitespace but just space. Whitespace also includes a number of special characters like '\t' (tab), '\r' (carriage return), '\n' (newline), '\v' (vertical tab), '\b' (backspace), and '\f' (form feed). Speaking of speed you probably don't want to erase() bits from the beginning of the string as this yield an O(n * n) algorithm while this can be done O(n) e.g. by keeping a variable with the position. This also avoid the problem of searching twice which is unnecessary expensive. You should also consider the behavior if there are two adjacent strings: should this produce an empty string in your sequence or should this be treated as if there were one space (in which case you might want to also use first_not_of()).

Actually, I just reread the problem statement: if you just want to extract the string up to the first space, you don't need erase(), empty(), or a loop if you insist on using assign(). However, this can be done using even two std::string members: find() and erase():

line.erase(line.find(" \t\r\n\v\b\f"));
Dietmar Kühl
  • 150,225
  • 13
  • 225
  • 380