4

How do i parse tokens from an input string. For example:

char *aString = "Hello world".

I want the output to be:

"Hello" "world"

GEOCHET
  • 21,119
  • 15
  • 74
  • 98
Progress Programmer
  • 7,076
  • 14
  • 49
  • 54

5 Answers5

7

You are going to want to use strtok - here is a good example.

Andrew Hare
  • 344,730
  • 71
  • 640
  • 635
6

Take a look at strtok, part of the standard library.

Todd Gamblin
  • 58,354
  • 15
  • 89
  • 96
6

strtok is the easy answer, but what you really need is a lexer that does it properly. Consider the following:

  • are there one or two spaces between "hello" and "world"?
  • could that in fact be any amount of whitespace?
  • could that include vertical whitespace (\n, \f, \v) or just horizontal (\s, \t, \r)?
  • could that include any UNICODE whitespace characters?
  • if there were punctuation between the words, ("hello, world"), would the punctuation be a separate token, part of "hello,", or ignored?

As you can see, writing a proper lexer is not straightforward, and strtok is not a proper lexer.

Other solutions could be a single character state machine that does precisely what you need, or regex-based solution that makes locating words versus gaps more generalized. There are many ways.

And of course, all of this depends on what your actual requirements are, and I don't know them, so start with strtok. But it's good to be aware of the various limitations.

Paul Beckingham
  • 14,495
  • 5
  • 33
  • 67
3

For re-entrant versions you can either use strtok_s for visual studio or strtok_r for unix

Leon
  • 91
  • 2
2

Keep in mind that strtok is very hard to get it right, because:

  • It modifies the input
  • The delimiter is replaced by a null terminator
  • Merges adjacent delimiters, and of course,
  • Is not thread safe.

You can read about this alternative.

dirkgently
  • 108,024
  • 16
  • 131
  • 187