0

I am trying to split sentences apart by punctuation (., ?, !). I have found on StackOverflow a way to separate a string by one delimiter, but I have not been able to find a way to separate a string based on multiple delimiters at once. Here is the code I have so far:

void chopSentences(std::string new_sentences, std::vector<std::string> sentences) {
    size_t pos = 0;
    std::string token;
    std::string delimiter = ".";
    while ((pos = new_sentences.find(delimiter) != std::string::npos)) {
        token = new_sentences.substr(0, pos);
        sentences.push_back(token);
        new_sentences.erase(0, pos + delimiter.length());
    }
}

Any idea on how to make it more than one delimiter?

Remy Lebeau
  • 555,201
  • 31
  • 458
  • 770
Matt B.
  • 331
  • 3
  • 12
  • 1
    You can use [strtok](https://www.cplusplus.com/reference/cstring/strtok/), which is a `c` function, but it gets the job done. – Kfir Ventura Aug 12 '21 at 16:28
  • 9
    [`std::basic_string::find_first_of()`](https://en.cppreference.com/w/cpp/string/basic_string/find_first_of) – Yksisarvinen Aug 12 '21 at 16:28
  • 2
    Note: the `strtok` function modifies the string that is searched. – Thomas Matthews Aug 12 '21 at 16:35
  • 2
    You are making unneeded copies of your original string in `new_sentences.erase(0, pos + delimiter.length());`. I would instead save the previous `pos` and search from it, then use both previos and current `pos` in a call to `substr`. – Vlad Feinstein Aug 12 '21 at 16:38
  • Is using Boost's Tokenizer or algorithm::split an option? – Yun Aug 12 '21 at 16:43

1 Answers1

0

If you're using C++11 or later, you can use std::regex_iterator:

std::string const s{"Hello, Johnny! Are you there?"};`
 
std::regex words_regex("[^[:punct:]\\?]+");
auto words_begin = 
    std::sregex_iterator(s.begin(), s.end(), words_regex);
auto words_end = std::sregex_iterator();
 
std::cout << "Found " 
          << std::distance(words_begin, words_end) 
          << " words:\n";
 
for (std::sregex_iterator i = words_begin; i != words_end; ++i) {
    std::smatch match = *i;                                                 
    std::string match_str = match.str(); 
    std::cout << match_str << '\n';
}

And then the printout is:

Found 3 words:
Hello
 Johnny
  Are you there

You'll have to further adjust the regex to remove the whitespace.

KyleKnoepfel
  • 1,426
  • 8
  • 24