Here is an approach with a few test cases as well.
The test cases are a series of char
arrays with particular strings to test the findNextWord()
method of the RetVal
struct/class.
char line1[] = "this is1 a line. \t of text \n "; // multiple white spaces
char line2[] = "another line"; // string that ends with zero terminator, no newline
char line3[] = "\n"; // line with newline only
char line4[] = ""; // empty string with no text
And here is the actual source code.
#include <iostream>
#include <cstring>
#include <cstring>
struct RetVal {
RetVal(char *p1, char *p2) : pFirst(p1), pLast(p2) {}
RetVal(char *p2 = nullptr) : pFirst(nullptr), pLast(p2) {}
char *pFirst;
char *pLast;
bool findNextWord()
{
if (pLast && *pLast) {
pFirst = pLast;
// scan the input line looking for the first non-space character.
// the isspace() function indicates true for any of the following
// characters: space, newline, tab, carriage return, etc.
while (*pFirst && isspace(*pFirst)) pFirst++;
if (pFirst && *pFirst) {
// we have found a non-space character so now we look
// for a space character or the end of string.
pLast = pFirst;
while (*pLast && ! isspace(*pLast)) pLast++;
}
else {
// indicate we are done with this string.
pFirst = pLast = nullptr;
}
}
else {
pFirst = nullptr;
}
// return value indicates if we are still processing, true, or if we are done, false.
return pFirst != nullptr;
}
};
void printWords(RetVal &x)
{
int iCount = 0;
while (x.findNextWord()) {
char xWord[128] = { 0 };
strncpy(xWord, x.pFirst, x.pLast - x.pFirst);
iCount++;
std::cout << "word " << iCount << " is \"" << xWord << "\"" << std::endl;
}
std::cout << "total word count is " << iCount << std::endl;
}
int main()
{
char line1[] = "this is1 a line. \t of text \n ";
char line2[] = "another line";
char line3[] = "\n";
char line4[] = "";
std::cout << "Process line1[] \"" << line1 << "\"" << std::endl;
RetVal x (line1);
printWords(x);
std::cout << std::endl << "Process line2[] \"" << line2 << "\"" << std::endl;
RetVal x2 (line2);
printWords(x2);
std::cout << std::endl << "Process line3[] \"" << line3 << "\"" << std::endl;
RetVal x3 (line3);
printWords(x3);
std::cout << std::endl << "Process line4[] \"" << line4 << "\"" << std::endl;
RetVal x4(line4);
printWords(x4);
return 0;
}
And here is the output from this program. In some cases the line to be processed has a new line in it which affects the output by performing a new line when printed to the console.
Process line1[] "this is1 a line. of text
"
word 1 is "this"
word 2 is "is1"
word 3 is "a"
word 4 is "line."
word 5 is "of"
word 6 is "text"
total word count is 6
Process line2[] "another line"
word 1 is "another"
word 2 is "line"
total word count is 2
Process line3[] "
"
total word count is 0
Process line4[] ""
total word count is 0
If you need to treat punctuation similar to white space, as something to be ignored, then you can modify the findNextWord()
method to include the ispunct()
test of characters in the loops as in:
bool findNextWord()
{
if (pLast && *pLast) {
pFirst = pLast;
// scan the input line looking for the first non-space character.
// the isspace() function indicates true for any of the following
// characters: space, newline, tab, carriage return, etc.
while (*pFirst && (isspace(*pFirst) || ispunct(*pFirst))) pFirst++;
if (pFirst && *pFirst) {
// we have found a non-space character so now we look
// for a space character or the end of string.
pLast = pFirst;
while (*pLast && ! (isspace(*pLast) || ispunct (*pLast))) pLast++;
}
else {
// indicate we are done with this string.
pFirst = pLast = nullptr;
}
}
else {
pFirst = nullptr;
}
// return value indicates if we are still processing, true, or if we are done, false.
return pFirst != nullptr;
}
In general if you need to refine the filters for the beginning and ending of words, you can modify those two places with some other function that looks at a character and classifies it as either a valid character for a word or not.