2

Is there a way to use boost::split to split a string when a blank line is encountered?

Here is a snippet of what I mean.

std::stringstream source;
source.str(input_string);
std::string line;
std::getline(source, line, '\0');
std::vector<std::string> token;
boost:split(token,line, boost::is_any_of("what goes here for blank line");
zstreet
  • 126
  • 9

1 Answers1

2

You can split by double \n\n unless you meant blank line as "a line that may contain other whitespace".

Live On Coliru

#include <boost/regex.hpp>
#include <boost/algorithm/string_regex.hpp>
#include <boost/algorithm/string/classification.hpp>
#include <sstream>
#include <iostream>
#include <iomanip>

int main() {
    std::stringstream source;
    source.str(R"(line one

that was an empty line, now some whitespace:
      
bye)");

    std::string line(std::istreambuf_iterator<char>(source), {});
    std::vector<std::string> tokens;

    auto re = boost::regex("\n\n");
    boost::split_regex(tokens, line, re);

    for (auto token : tokens) {
        std::cout << std::quoted(token) << "\n";
    }
}

Prints

"line one"
"that was an empty line, now some whitespace:
      
bye"

Allow whitespace on "empty" lines

Just express it in a regular expression:

auto re = boost::regex(R"(\n\s*\n)");

Now the output is: Live On Coliru

"line one"
"that was an empty line, now some whitespace:"
"bye"
sehe
  • 374,641
  • 47
  • 450
  • 633
  • Okay I am getting an error initially so let me clarify. My input_string is a string variable so does that modify your R parameter in source.str(R"string")? I have been trying to implement the "\n\n" in so many different ways but this looks promising... – zstreet Jun 20 '20 at 21:24
  • You started from a stringstream. But as you can see `line` is already a string. Drop the stream if you want: http://coliru.stacked-crooked.com/a/9f092da3fab87748 – sehe Jun 20 '20 at 21:26
  • For fun, not using Boost Regex (no linking) and handling huge files without allocating lots of tokens: http://coliru.stacked-crooked.com/a/3e1e0682d44b42b7 – sehe Jun 20 '20 at 21:50
  • Using std::regex: http://coliru.stacked-crooked.com/a/e90b178f71eadcda – sehe Jun 20 '20 at 22:06
  • Yeah. That's inaccurate. Boost Regex is much better in many regards, but then there is the aspect of "it's part of the standard library" that's hard to beat. And the interface that _is_ present is pretty similar. – sehe Jun 20 '20 at 22:21
  • It's not so much an opinion. Mostly [the performance is much better than any of the standard library implementations](https://stackoverflow.com/a/14229152/85371) (can be, due to standard conformance). Only Boost has full Perl regex support. There's a lot of confusion, but std::regex has a dialect **based on** ECMAScript which is a subset of Perl. See here for a comprehensive feature comparison https://en.wikipedia.org/wiki/Comparison_of_regular-expression_engines#Language_features. For simple tasks like these I'd use std::regex. For high perf, BoostXpressive/Boost Regex – sehe Jun 20 '20 at 22:37
  • I figured it was performance, thank you so much! I am using Boost in other places already so definitely will change to boost regex too! – zstreet Jun 20 '20 at 22:42
  • What's a great answer. Why no one voted up? (I just notice the question was just asked 3 days earlier.) – Zhang Jun 24 '20 at 09:28