5

Suppose I have a string :

argsStr = "server ('m1.labs.terad  ''ata.com') username ('us ''er5') password('user)5') dbname ('def\\ault')";

Now I am using the following code to extract the tokens:

'm1.labs.terad  ''ata.com'  <- token1
'us ''er5'                    <-token2
'user)5'                    <-token3
'def\ault'                  <-token4

Code:

regex re("(\'(.*?)\'\)");
typedef std::vector<std::string> StringVector;
StringVector arg_values;
boost::regex re_arg_values("('[^']*(?:''[^']*)*')");
boost::sregex_token_iterator name_iter_start(argsStr.begin(),argsStr.end(), re_arg_values, 0),name_iter_end;
std::copy(value_iter_start, value_iter_end,std::back_inserter(arg_values)); 
//putting the token in the string vector.

Now after putting it into the string vector, How can I convert the tokens/ string to replace double quotes with single quotes:

For example:

'm1.labs.terad ''ata.com' should become 'm1.labs.terad 'ata.com' and 'us ''er5' should become 'us 'er5'.

Can I use boost::replace_all for this?

sehe
  • 374,641
  • 47
  • 450
  • 633
hydra123
  • 337
  • 5
  • 14
  • Why don't you write a function, that iterates over each token and when finds two consecutive `'`,take only one of them. – Gaurav Sehgal Jul 24 '17 at 04:57
  • It's not clear what you are asking, in the first example you erased both quotes, in the second you replaced them with a single one. – Matteo Italia Jul 24 '17 at 05:02
  • @MatteoItalia my bad. Sorry have corrected. – hydra123 Jul 24 '17 at 05:06
  • Hi, Is this fine : Boost::replace_all(s," ' ' ", " ' "); ?? – hydra123 Jul 24 '17 at 05:12
  • 1
    @hydra123 `replace_all()` would work, I believe that's what you are looking for. – PeskyPotato Jul 24 '17 at 05:37
  • @sehe the question is different. Better watch the whole thing! – hydra123 Jul 24 '17 at 12:15
  • @hydra123 Believe me I know. I've objectively [put more effort into this](https://stackoverflow.com/a/45281989/85371) than anyone else. – sehe Jul 24 '17 at 13:34
  • Replace_all helps... @sehe if u have a problem with so many questions. Don't bother replying.. I had to add he questions because I had only one string as an example and many questions.. – hydra123 Jul 24 '17 at 16:48
  • @sehe there is a reason I do not want to write the parser... And I guess the people who have supported the regex (mind it most people dint say "regex isn't the tool for the job") and the regex is working fine !.. thanks for the time – hydra123 Jul 24 '17 at 17:11
  • I wish you good luck. I'm not going to go back to found out exactly which people "supported the regex". All I note is that you can't seem make the code work _using regex_. Instead of railing after the fact, you could simply say why you reject the answer. Instead, you don't respond _at all_ to the answers (this has been the first time you say anything remotely relevant like "there is a reason I do not want to" - although it's not clear /what/ you don't want to: you're "writing the parser" anyway, but with crutches named regex). – sehe Jul 24 '17 at 20:22
  • And no I don't have a problem with many questions. But in this case the sequence of questions show you don't know what you're doing. And I care about that: I want to show you [parsing into datastructures](https://stackoverflow.com/a/45238705/85371), explain what [character escapes](https://stackoverflow.com/a/45242982/85371) are and [that escapes are a presentation-only thing](https://stackoverflow.com/questions/45237637/q/45238705#comment77444192_45237637). I just want people to learn how to do things right and know why. It's _why we are here_. – sehe Jul 24 '17 at 20:31

2 Answers2

7

Okay. You've been asking about this parsing jobs for 6 questions straight¹.

Many people have been telling you regex is not the tool for the job. Including me:

enter image description here

I've shown you

  • An example of a Spirit X3 grammar that parses this config string into a key-value map, correctly intepreting escaped quotes ('\\'' e.g.) (see here)
  • I expanded on it (in 13 characters) to allow for repeated quotes to escape a quote (see here)

All my examples have been superior in that they already parse the keys along with the values, so you have a proper map of config settings.

Yet you still ask for it in you latest question (Extract everything apart from what is specified in the regex).

Of course the answer was in my very first answer:

for (auto& setting : parse_config(text))
    std::cout << setting.first << "\n";

I posted this along with a C++03 version of it live on Coliru

Writing The Manual Parser

If you are rejecting it because you don't understand, all you had to do is ask.

If you "don't wanna" use Spirit, you can easily write a similar parser manually. I didn't, because it is tedious and error prone. Here you are in case you need it for inspiration:

  1. still c++03
  2. using only standard library features
  3. still parsing single/double-quoted strings with escapable quotes
  4. still parses into map<string, string>
  5. raises informative error messages on invalid input

BOTTOM LINE: Use a proper grammar like people have been urging you since day 1

Live On Coliru

#include <iostream>
#include <sstream>
#include <map>

typedef std::map<std::string, std::string> Config;
typedef std::pair<std::string, std::string> Entry;

struct Parser {
    Parser(std::string const& input) : input(input) {}
    Config parse() {
        Config parsed;

        enum { KEY, VALUE } state = KEY;
        key = value = "";
        f = input.begin(), l = input.end();

        while (f!=l) {
            //std::cout << "state=" << state << ", '" << std::string(It(input.begin()), f) << "[" << *f << "]" << std::string(f+1, l) << "'\n";
            switch (state) {
              case KEY:
                  skipws();
                  if (!parse_key())
                      raise("Empty key");

                  state = VALUE;
                  break;
              case VALUE:
                  if (!expect('(', true))
                      raise("Expected '('");

                  if (parse_value('\'') || parse_value('"')) {
                      parsed[key] = value;
                      key = value = "";
                  } else {
                      raise("Expected quoted value");
                  }

                  if (!expect(')', true))
                      raise("Expected ')'");

                  state = KEY;
                  break;
            };
        }

        if (!(key.empty() && value.empty() && state==KEY))
            raise("Unexpected end of input");

        return parsed;
    }

  private:
    std::string input;

    typedef std::string::const_iterator It;
    It f, l;
    std::string key, value;

    bool parse_key() {
        while (f!=l && alpha(*f))
            key += *f++;
        return !key.empty();
    }

    bool parse_value(char quote) {
        if (!expect(quote, true))
            return false;

        while (f!=l) {
            char const ch = *f++;
            if (ch == quote) {
                if (expect(quote, false)) {
                    value += quote;
                } else {
                    //std::cout << " Entry " << key << " -> " << value << "\n";
                    return true;
                }
            } else {
                value += ch;
            }
        }

        return false;
    }

    static bool space(unsigned char ch) { return std::isspace(ch); }
    static bool alpha(unsigned char ch) { return std::isalpha(ch); }
    void skipws() { while (f!=l && space(*f)) ++f; }
    bool expect(unsigned char ch, bool ws = true) {
        if (ws) skipws();
        if (f!=l && *f == ch) {
            ++f;
            if (ws) skipws();
            return true;
        }
        return false;
    }

    void raise(std::string const& msg) {
        std::ostringstream oss;
        oss << msg << " (at '" << std::string(f,l) << "')";
        throw std::runtime_error(oss.str());
    }
};

int main() {
    std::string const text = "server ('m1.labs.terad  ''ata.com') username ('us\\* er5') password('user)5') dbname ('def\\ault')";

    Config cfg = Parser(text).parse();

    for (Config::const_iterator setting = cfg.begin(); setting != cfg.end(); ++setting) {
        std::cout << "Key " << setting->first << " has value " << setting->second << "\n";
    }

    for (Config::const_iterator setting = cfg.begin(); setting != cfg.end(); ++setting) {
        std::cout << setting->first << "\n";
    }
}

Prints, as always:

Key dbname has value def\ault
Key password has value user)5
Key server has value m1.labs.terad  'ata.com
Key username has value us\* er5
dbname
password
server
username

¹ see

  1. avoid empty token in cpp
  2. extracting whitespaces using regex in cpp
  3. Regex to extract value between a single quote and parenthesis using boost token iterator
  4. tokenizing string , accepting everything between given set of characters in CPP
  5. extract a string with single quotes between parenthesis and single quote
  6. Extract everything apart from what is specified in the regex
  7. this one
sehe
  • 374,641
  • 47
  • 450
  • 633
-1

Replace Substring with Substring in a String Using a For Loop

Here we replace a substring with another substring and return the amended string. We pass in the string to be changed, the string we want to find and the string we want to replace it with, s, s_to_replace and s_replace.

find() searches and finds the first character of the string passed in and returns an iterator at that position. std::string::npos this value is greatest possible value size_t can reach, i.e. the end of the string. std::string::erase takes the position of the first character and the number of characters to replace and erases them. std::string::insert takes position of where to insert and the string to insert and does just that.

std::string replace_substring(string s, const string s_to_replace, const string s_replace) {
    for(size_t position = 0; ; position += s_replace.length()) {

        position = s.find(s_to_replace, position);

        if(position == string::npos || s.empty()) break;

        s.erase(position, s_to_replace.length());
        s.insert(position, s_replace);
        // s.replace(position, s_to_replace.length(), s_replace)
    }
    return s;
}

Replace Substring with Substring in a String Using Boost

#include <boost/algorithm/string/replace.hpp>

boost::replace_all(s, s_to_replace, s_replace);
PeskyPotato
  • 670
  • 8
  • 20
  • This cannot work, you are replacing a single character with another single character, while OP actually wants to drop a character. – Matteo Italia Jul 24 '17 at 05:03
  • @matteo-italia I guess I misunderstood the question, I've appended a corrected solution now. – PeskyPotato Jul 24 '17 at 05:25
  • I don't understand, why the downvotes? Does this not answer OP's question? – PeskyPotato Jul 24 '17 at 05:51
  • Maybe because it starts with something that does not answer OP's question? – juanchopanza Jul 24 '17 at 06:02
  • I've taken the solution out now, I did put a header on it after I found out it wasn't what OP was asking about so it would be obvious. – PeskyPotato Jul 24 '17 at 06:04
  • 1
    1) The function return type should be `std::string`. 2) You have a syntax error with `s_to_replace.search`. 3) The `erase/insert` can be done with a single call to `std::string::replace`. 4) You need to check if `s_to_replace` is empty to avoid an infinite loop. 5) See [here](https://stackoverflow.com/a/3418285/445976) for a similar implementation. – Blastfurnace Jul 24 '17 at 06:21
  • 1
    Note that `erase/insert` will work but I think it has a performance bug. The `erase` makes a hole and all the following characters are moved left to fill it. Then `insert` moves all the same characters to the right to open a hole for the new string. A single `std::string::replace` should cut that work in half. – Blastfurnace Jul 24 '17 at 06:36
  • Thanks @Blastfurnace I've made the corrections, but I've added `std::string::replace()` in comments for now since I do not have a change try it out, nor have I used it before. – PeskyPotato Jul 24 '17 at 06:44
  • I should have been clearer, you need to check `s_to_replace` for empty. `std::string::find` always matches an empty string at position <= s.size(). It leads to a bug in this code. – Blastfurnace Jul 24 '17 at 07:20
  • So if `s_to_replace` is empty I should break before I call `find()`? – PeskyPotato Jul 24 '17 at 07:23
  • The problem is `find` always matches an empty string at the given position (<= s.size()). So the loop inserts `s_replace`, increment the position, inserts `s_replace`, increments the position, etc. It gobbles memory until it crashes. I would check for empty before the loop and return early. – Blastfurnace Jul 24 '17 at 07:44