1

I have the following code :

  #include <iostream>
#include <regex>

using namespace std;

int main()
{
  string s;

      s = "server ('m1.labs.terad  ata.com') username ('us er5') password('user)5') dbname ('def\\ault')";

    regex re("(\'(.*?)\'\)");
    sregex_token_iterator i(s.begin(), s.end(), re, 1);
   sregex_token_iterator j;

    unsigned count = 0;
    while(i != j)
      {
        cout <<*i<< endl;
        count++;
        i++;
      }
    cout << "There were " << count << " tokens found." << endl;

  return 0;
}

the regex above is meant to extract everything between the single quotes.

But how can I make the regex so that it is able to extract escaped single quotes (example username (user''5) should be extracted as 'user'5'.

Thanks in advance. I really need help with this . Had been trying for so many days.

Example

'm1.labs.terad  ata.com'
'us er5'
'user)5'
'def\ault'

There were 4 tokens found. Please note that the single quote around the string should be there. Thanks in advance for help.

But now if my string is

 s = "server ('m1.labs.terad  ata.com') username ('us ''er5') password('user)5') dbname ('def\\ault')";

The output should be :

   'm1.labs.terad  ata.com'
    'us 'er5'   <<<<<<<<<<<<<<<<<<<
    'user)5'
    'def\ault'
hydra123
  • 337
  • 5
  • 14
  • This is probably not best done with a regex match, where regular expressions can only match what is there so once you start getting into the realms of changing the values of the string you're outwith the scope of matching patterns. You may wish to use some sort of substitution regex, eg replace(\\\\, \\\), but I'm not familiar enough with `c++` to help you there. – Tom Wyllie Jul 21 '17 at 14:32
  • Please try this `regex re("(\\'.*?[^\\\\]\\')");` . Also, please notice, that you need to escape backslashes in c++ strings. – ikleschenkov Jul 21 '17 at 14:42
  • @ikleschenkov I am sorry if i wasn't clear. But I want to escape single quotes by adding tow double quotes . i.e 'user''6' should be 'user'6' – hydra123 Jul 21 '17 at 14:53
  • @hydra123 could you edit your question to show, what should be the input line (I don't see `'user''6'` in it) and what result you want to get? Do you want to extract all string between the single quotes and convert double single quotes to single single quotes? – ikleschenkov Jul 21 '17 at 15:04
  • `( <-- Unbalanced '(' \' ( .*? ) \'\)` Parsing C++ double quoted strings only remove's escapes on escapes. Your regex should not compile. –  Jul 21 '17 at 15:06
  • @ikleschenkov Yes I have edited. Please see. – hydra123 Jul 21 '17 at 15:09
  • @sln the regex given by ikleschenkov is compiling succcessfully – hydra123 Jul 21 '17 at 15:09
  • You say that a single quote is _escaped_ by another single quote. Or, is that something the ide rendeered wrong ? –  Jul 21 '17 at 15:15
  • yes I am working with postgress here single quote is escaped by two single quotes @sln. – hydra123 Jul 21 '17 at 15:15
  • Then, `'(user''5)'` should be extracted. But won't be as `'user'5'`. The stripping of that single quote is something you have to do after the match. –  Jul 21 '17 at 15:20
  • @sln I dont want to do that i just want to modify my regex so that it accepts the ' ' (two single quotes in the string). Post- that I can replace ' ' with ' – hydra123 Jul 21 '17 at 15:22
  • Here you go `(?<!')'[^']*(?:''[^']*)*'` or, if you can't do look behinds, use this `'[^']*(?:''[^']*)*'` –  Jul 21 '17 at 15:24
  • regex is used to extract the substrings, not to modify them (in common case). So you have to extract the proper strings, using @ubombi regex `('(?:[^']|'')*?')(?!')` and then manually replace `''` with `'` – ikleschenkov Jul 21 '17 at 15:41
  • @ikleschenkov Can I extract the argument name in the string : s = "server ('m1.labs.terad ata.com') username ('us er5') password('user)5') dbname ('def\\ault')"; token1= server token2= username ........ ?? – hydra123 Jul 24 '17 at 03:54
  • @hydra123 try this: `regex re("(\\w+?)(?=\\s*\\(')");` It grabs words (contains `[a-zA-Z0-9_]` ) of nonzero length, after which there is `('` – ikleschenkov Jul 24 '17 at 12:06

2 Answers2

2

But how can I make the regex so that it is able to extract escaped single quotes (example username (user''5) should be extracted as 'user'5'.

Ugh. Is that what you meant? I was right about X/Y problem then.

Note: What you describe is known as escaping special characters. Two common ways to escape special characters:

  1. repeat it (e.g. printf("100%%"); to print 100%)
  2. introduce it using another escape (usually backslash). E.g.

    std::cout << "Hello \"World\"" << std::endl;
    

    Or, one more intricate example:

    std::cout << "Newline is \\n" << std::endl;
    

Here you go: just add q >> char_(q) to accept repeated quotes as quote-escape:

auto quoted = [](char q) { 
    return lexeme[ q >> *(
              q >> char_(q)  // accept repeated quotes as quote-escape
            | '\\' >> char_  // accept backs-slash escape
            | char_ - q      // accept any other non-quote
         ) >> q]; };

Nothing else changes relative to tokenizing string , accepting everything between given set of characters in CPP

Live On Coliru

#include <iostream>
#include <boost/spirit/home/x3.hpp>
#include <boost/fusion/adapted/std_pair.hpp>
#include <map>

using Config = std::map<std::string, std::string>;
using Entry  = std::pair<std::string, std::string>;

namespace parser {
    using namespace boost::spirit::x3;

    template <typename T> auto as = [](auto p) { return rule<struct _, T> {} = p; };
    auto quoted = [](char q) { return lexeme[q >> *(q >> char_(q) | '\\' >> char_ | char_ - q) >> q]; };

    auto value  = quoted('\'') | quoted('"');
    auto key    = lexeme[+alpha];
    auto pair   = key >> '(' >> value >> ')';
    auto config = skip(space) [ *as<Entry>(pair) ];
}

Config parse_config(std::string const& cfg) {
    Config parsed;
    auto f = cfg.begin(), l = cfg.end();
    if (!parse(f, l, parser::config, parsed))
        throw std::invalid_argument("Parse failed at " + std::string(f,l));
    return parsed;
}

int main() {
    auto const text = "server ('m1.labs.teradata.com') username ('use'')r_*5') password('u\" er 5') dbname ('default')";
    Config cfg = parse_config(text);

    for (auto& setting : cfg)
        std::cout << "Key " << setting.first << " has value " << setting.second << "\n";
}

Prints

Key dbname has value default
Key password has value u" er 5
Key server has value m1.labs.teradata.com
Key username has value use')r_*5
sehe
  • 374,641
  • 47
  • 450
  • 633
0

You should look at look-around and conditional regexp.
And regex engine should be PCRE compatible. (I don't know about C++)

You should newer use regexp, you have found in the internet if you dont understand it.

Try something like '((?:[^']|'')*?)'(?!') (demo on 101regex)

ubombi
  • 1,115
  • 2
  • 15
  • 30