1

The function template boost::algorithm::split_regex splits a single string into strings on the substring of the original string that matches the regex pattern we passed to split_regex. The question is: how can I split it only once on the first substring that matches? That is, is it possible to make split_regex stop after its first splitting? Please see the following codes.

#include <boost/algorithm/string/regex.hpp>
#include <boost/format.hpp>
#include <boost/regex.hpp>
#include <iostream>
#include <locale>

int main(int argc, char *argv[])
{
    using namespace std;
    using boost::regex;
    locale::global(locale(""));
    // Create a standard string for experiment.
    string strRequestLine("Host: 192.168.0.1:12345");
    regex pat(R"(:\s*)", regex::perl | boost::regex_constants::match_stop);
    // Try to split the request line.
    vector<string> coll;
    boost::algorithm::split_regex(coll, strRequestLine, pat);
    // Output what we got.
    for (const auto& elt : coll)
        cout << boost::format("{%s}\n") % elt;
    // Exit the program.
    return 0;
}

Where shall the codes be modified to have the output like

{Host}
{192.168.0.1:12345}

instead of the current output

{Host}
{192.168.0.1}
{12345}

Any suggestion/hint? Thanks.

Please note that I'm not asking how to do it with other functions or patterns. I'm asking if it's possible for split_regex to split only once and then stop. Because regex object seems to have the ability to stop at the first matched, I wonder that if offering it some proper flags it maybe stop at the first matched.

Cody
  • 609
  • 4
  • 21
  • would "posn = strRequestLine.find_first_of(":"); be useful? – Angel Koh Dec 26 '14 at 06:11
  • 1
    The above code is just a demo detailing my question. I wonder if `split_regex` can stop at the first pattern it finds. I know how to accomplish the task in another way, before posting this question. I'm only interested by using `split_regex` at the moment. Thanks anyway. – Cody Dec 26 '14 at 08:43
  • You might be interested in the first sample I listed **[here](http://stackoverflow.com/questions/26902755/skipping-blank-lines-when-reading-line-delimited-list-of-strings/26906134#comment42375456_26906134)**: [small HTTP response headers parsing function](http://paste.ubuntu.com/8989134/). Summary: use `phrase_parse(f, e, token >> ':' >> lexeme[*(char_ - eol)], space, key, value)` – sehe Dec 26 '14 at 10:08

2 Answers2

1

For your specific input it seems the simple fix is to change the pattern to become R"(:\s+)". Of course, this assumes that there is, at least, one space after Host: and no space between the IP address and the port.

Another alternative would be not to use split_regex() but rather std::regex_match():

#include <iostream>
#include <regex>
#include <string>

int main()
{
    std::string strRequestLine("Host: 192.168.0.1:12345");
    std::smatch results;
    if (std::regex_match(strRequestLine, results, std::regex(R"(([^:]*):\s*(.*))"))) {
        for (auto it(++results.begin()), end(results.end()); it != end; ++it) {
            std::cout << "{" << *it << "}\n";
        }
    }
}
Dietmar Kühl
  • 150,225
  • 13
  • 225
  • 380
0

Expanding from my comment:

You might be interested in the first sample I listed here: small HTTP response headers parsing function. Summary: use phrase_parse(f, e, token >> ':' >> lexeme[*(char_ - eol)], space, key, value)

Here's a simple sample:

Live On Coliru

#include <boost/spirit/include/qi.hpp>
namespace qi = boost::spirit::qi;

namespace {
    typedef std::string::const_iterator It;

    // 2.2 Basic Rules (rfc1945)
    static const qi::rule<It, std::string()> rfc1945_token = +~qi::char_( " \t><@,;:\\\"/][?=}{:"); // FIXME? should filter CTLs
}

#include <iostream>

int main()
{
    std::string const strRequestLine("Host: 192.168.0.1:12345");
    std::string::const_iterator f(strRequestLine.begin()), l(strRequestLine.end());

    std::string key, value;
    if (qi::phrase_parse(f, l, rfc1945_token >> ':' >> qi::lexeme[*(qi::char_ - qi::eol)], qi::space, key, value))
        std::cout << "'" << key << "' -> '" << value << "'\n";
}

Prints

'Host' -> '192.168.0.1:12345'
Community
  • 1
  • 1
sehe
  • 374,641
  • 47
  • 450
  • 633