0

I have the following code:

   int main()
{
  string s = "server ('m1.labs.teradata.com') username ('use\\')r_*5') password('u\" er 5') dbname ('default')";

    regex re("(\'[!-~]+\')");
    sregex_token_iterator i(s.begin(), s.end(), re, 1);
    sregex_token_iterator j;

    unsigned count = 0;
    while(i != j)
    {
        cout << "the token is  "<<*i<< endl;
        count++;
    }
    cout << "There were " << count << " tokens found." << endl;

  return 0;
}

Using the above regex, I wanted to extract the string between the paranthesis and single quote:, The out put should look like :

the token is   'm1.labs.teradata.com'
the token is   'use\')r_*5'
the token is   'u" er 5'
the token is   'default'
There were 4 tokens found.

Basically, the regex supposed to extract everything between " (' " and " ') ". It can be anything space , special character, quote or a closing parathesis. I has earlier used the following regex:

boost::regex re_arg_values("(\'[!-~]+\')");

But is was not accepting space. Please can someone help me out with this. Thanks in advance.

hydra123
  • 337
  • 5
  • 14
  • Is the code that compiles correctly? I can not compile your code. `syntax error` occured. – Bryant Jul 21 '17 at 12:33
  • replace regex re("('([^'\\]*(?:\\[\s\S][^'\\]*)*)')"); With regex re("(\'[!-~]+\')"); and then try – hydra123 Jul 21 '17 at 12:34
  • You have to post full source code to everyone help you. Still error. – Bryant Jul 21 '17 at 12:36
  • 1 to match `()` you need to escape them as they used to mark subexression. 2 I doubt you can parse such string by regex, you need a parser. – Slava Jul 21 '17 at 12:38
  • @Slava I was just gonna say/show :) – sehe Jul 21 '17 at 12:38
  • hydra123, you post in the wrong syntax. Slava and Sehe just pointed out your mistake. It is recommended to post again. – Bryant Jul 21 '17 at 12:41
  • @Bryant instead it's recommend to edit the post – sehe Jul 21 '17 at 12:41
  • you can write a parser manually (using regex as well) or use special tool like `boost::spirit` or `lex` or something else. I do not think you can parse it with regex only. – Slava Jul 21 '17 at 12:44
  • Posted a proper parser function that also parses the keys, see [new answer](https://stackoverflow.com/a/45238705/85371) /cc @Slava – sehe Jul 21 '17 at 13:18
  • @sehe: Yes, apologies, I will edit the post. Is it possible to get the values in single quotes ? i.e 'm1.labs.teradata.com' , 'user (5' etc ? And can the key value pair extracted be converted to a String? – hydra123 Jul 21 '17 at 13:32
  • What do you need the single quotes for? That sounds like an [X/Y problem](https://meta.stackexchange.com/questions/66377/what-is-the-xy-problem). Basically, program **never** deal with the quoted/escaped versions. They merely exist because of text representation issues. – sehe Jul 21 '17 at 13:36
  • What about `"'" + s "'"`? Or [`std::quoted(s, '\'')`](http://en.cppreference.com/w/cpp/io/manip/quoted)? Similarly for keyvalue: `k + "('" + v << "')"`? – sehe Jul 21 '17 at 13:36
  • @sehe I am new to this, I dont kknow how does it work. Can you edit the code and show. Thanks in advance – hydra123 Jul 21 '17 at 13:42

2 Answers2

2

Here's a sample of using Spirit X3 to create grammar to actually parse this. I'd like to parse into a map of (key->value) pairs, which makes a lot more sense than just blindly assuming the names are always the same:

using Config = std::map<std::string, std::string>;
using Entry  = std::pair<std::string, std::string>;

Now, we setup some grammar rules using X3:

namespace parser {
    using namespace boost::spirit::x3;

    auto value  = quoted("'") | quoted('"');
    auto key    = lexeme[+alpha];
    auto pair   = key >> '(' >> value >> ')';
    auto config = skip(space) [ *as<Entry>(pair) ];
}

The helpers as<> and quoted are simple lambdas:

template <typename T> auto as = [](auto p) { return rule<struct _, T> {} = p; };
auto quoted = [](auto q) { return lexeme[q >> *('\\' >> char_ | char_ - q) >> q]; };

Now we can parse the string into a map directly:

Config parse_config(std::string const& cfg) {
    Config parsed;
    auto f = cfg.begin(), l = cfg.end();
    if (!parse(f, l, parser::config, parsed))
        throw std::invalid_argument("Parse failed at " + std::string(f,l));
    return parsed;
}

And the demo program

int main() {
    Config cfg = parse_config("server ('m1.labs.teradata.com') username ('use\\')r_*5') password('u\" er 5') dbname ('default')");

    for (auto& setting : cfg)
        std::cout << "Key " << setting.first << " has value " << setting.second << "\n";
}

Prints

Key dbname has value default
Key password has value u" er 5
Key server has value m1.labs.teradata.com
Key username has value use')r_*5

LIVE DEMO

Live On Coliru

#include <iostream>
#include <boost/spirit/home/x3.hpp>
#include <boost/fusion/adapted/std_pair.hpp>
#include <map>

using Config = std::map<std::string, std::string>;
using Entry  = std::pair<std::string, std::string>;

namespace parser {
    using namespace boost::spirit::x3;

    template <typename T> auto as = [](auto p) { return rule<struct _, T> {} = p; };
    auto quoted = [](auto q) { return lexeme[q >> *(('\\' >> char_) | (char_ - q)) >> q]; };

    auto value  = quoted("'") | quoted('"');
    auto key    = lexeme[+alpha];
    auto pair   = key >> '(' >> value >> ')';
    auto config = skip(space) [ *as<Entry>(pair) ];
}

Config parse_config(std::string const& cfg) {
    Config parsed;
    auto f = cfg.begin(), l = cfg.end();
    if (!parse(f, l, parser::config, parsed))
        throw std::invalid_argument("Parse failed at " + std::string(f,l));
    return parsed;
}

int main() {
    Config cfg = parse_config("server ('m1.labs.teradata.com') username ('use\\')r_*5') password('u\" er 5') dbname ('default')");

    for (auto& setting : cfg)
        std::cout << "Key " << setting.first << " has value " << setting.second << "\n";
}

Bonus

If you want to learn how to extract the raw input: just try

auto source = skip(space) [ *raw [ pair ] ]; 

as in this:

using RawSettings = std::vector<std::string>;

RawSettings parse_raw_config(std::string const& cfg) {
    RawSettings parsed;
    auto f = cfg.begin(), l = cfg.end();
    if (!parse(f, l, parser::source, parsed))
        throw std::invalid_argument("Parse failed at " + std::string(f,l));
    return parsed;
}

int main() {
    for (auto& setting : parse_raw_config(text))
        std::cout << "Raw: " << setting << "\n";
}

Which prints: Live On Coliru

Raw: server ('m1.labs.teradata.com')
Raw: username ('use\')r_*5')
Raw: password('u" er 5')
Raw: dbname ('default')
sehe
  • 374,641
  • 47
  • 450
  • 633
  • If you want to learn how to extract the raw input: just try `auto source = skip(space) [ *raw [ pair ] ];` as in this [Live Demo](http://coliru.stacked-crooked.com/a/5e6f109cef66a3ec) – sehe Jul 21 '17 at 13:42
0

Fixing a few syntax and style issues:

  • you need to escape \ in C strings
  • you had a " in s, making a syntax error
#include <boost/regex.hpp>
#include <boost/range/iterator_range.hpp>
#include <iostream>

int main() {
    std::string s = "server ('m1.labs.teradata.com') username ('use\')r_*5') password('u' er 5') dbname ('default')";

    boost::regex re(R"(('([^'\\]*(?:\\[\s\S][^'\\]*)*)'))");

    size_t count = 0;
    for (auto tok : boost::make_iterator_range(boost::sregex_token_iterator(s.begin(), s.end(), re, 1), {})) {
        std::cout << "Token " << ++count << " is " << tok << "\n";
    }
}

Prints

Token 1 is 'm1.labs.teradata.com'
Token 2 is 'use'
Token 3 is ') password('
Token 4 is ' er 5'
Token 5 is 'default'
sehe
  • 374,641
  • 47
  • 450
  • 633