Following on from the very good approach by @Casey, but adding the use of std::vector
instead of an array, allows you to break a line into as many words as may be included in it. Using the std::stringstream
and extracting with >>
allows a simple way to tokenize the sentence while ignoring leading, multiple included and trailing whitespace.
For example, you could do:
#include <iostream>
#include <string>
#include <sstream>
#include <vector>
int main (void) {
std::string sentence = " I am unable to store last word ",
word {};
std::stringstream ss (sentence); /* create stringstream from sentence */
std::vector<std::string> words {}; /* vector of strings to hold words */
while (ss >> word) /* read word */
words.push_back(word); /* add word to vector */
/* output original sentence */
std::cout << "sentence: \"" << sentence << "\"\n\n";
for (const auto& w : words) /* output all words in vector */
std::cout << w << '\n';
}
Example Use/Output
$ ./bin/tokenize_sentence_ss
sentence: " I am unable to store last word "
I
am
unable
to
store
last
word
If you need more fine-grained control, you can use std::string::find_first_of
and std::string::find_first_not_of
with a set of delimiters to work your way through a string finding the first character in a token with std::string::find_first_of
and then skipping over delimiters to the start of the next token with std::string::find_first_not_of
. That involves a bit more arithmetic, but is a more flexible alternative.