1

Is it possible to prevent boost's escaped_list_separator from consuming quotes in a quoted token? Or are there any other ready-to-use constructs to archive this behavior?
The inner quotes cannot be escaped as the grammar doesn't support that and is defined by a third party.

Example:

std::string input("ID=abcde;PARAM={this;{is};quoted}");
boost::escaped_list_separator<char> separator("", ";", "{}");
boost::tokenizer<boost::escaped_list_separator<char>> tokenizer(input, separator);

for(const auto &token : tokenizer)
{
    std::cout << token << std::endl;
}

This yields

ID=abcde
PARAM=this;is;quoted

but I need

ID=abcde
PARAM=this;{is};quoted
sigy
  • 2,408
  • 1
  • 24
  • 55
  • Pre-process the input string and convert the inner 'quotes' (which appear to be braces) to something else? Then convert them back after. – Paul Sanders Jul 22 '22 at 15:31
  • 1
    @PaulSanders Detecting which character is a real quotation character and which isn't and should be replaced would require parsing the string by hand, wouldn't it? I could then tokenize the string by hand anyway, which I would like to avoid. Or am I missing something? – sigy Jul 22 '22 at 15:35
  • I would think the sort of pre-processing I'm talking about would be less work than parsing the entire string yourself - if you can trust that the string is well-formed, you just need to walk though the string keeping a count of unclosed braces. – Paul Sanders Jul 22 '22 at 15:39

1 Answers1

2

UPDATE Given the context of MSODBC connection strings, see update below

Don't tokenize if you want to parse.

I'll make some assumptions:

  • you want to parse into a map of key/value pairs (like {"ID","abcde"})
  • the nested {} braces are not to be ignored, but must be balanced (in that respect it's weird that they're not interpreted, but maybe you're just not showing the real purpose of the code)

Example: Spirit X3

Live On Compiler Explorer

//#define BOOST_SPIRIT_X3_DEBUG
#include <boost/fusion/adapted.hpp>  // for std::pair support
#include <boost/spirit/home/x3.hpp>
#include <iostream>
#include <map>

using Map = std::map<std::string, std::string>;
using Entry = std::pair<std::string, std::string>;

namespace Grammar {
  using namespace boost::spirit::x3;

  auto entry  = rule<struct Entry_, Entry>{"entry"};
  auto quoted = rule<struct Quoted_, std::string>{"quoted"};

  auto key        = +~char_("=;");
  auto quoted_def = '{' >> raw[ *(quoted | +~char_("{}")) ] >> '}';
  auto raw        = *~char_(";");

  auto value      = quoted | raw;
  auto entry_def  = key >> '=' >> value;

  BOOST_SPIRIT_DEFINE(quoted, entry)
   
  auto full = entry % ';' >> eoi;
};

Map parse_map(std::string_view sv) {
  Map m;

  if (!parse(sv.begin(), sv.end(), Grammar::full, m))
    throw std::runtime_error("Parse error");

  return m;
}

#include <fmt/ranges.h>
int main() {
  auto m = parse_map("ID=abcde;PARAM={this;{is};quoted}");
  fmt::print("Result: {}\n", m);
}

Prints

Result: {"ID": "abcde", "PARAM": "this;{is};quoted"}

UPDATE: MSODBC Connection Strings

Going from the scant documentation here:

Applications do not have to add braces around the attribute value after the Driver keyword unless the attribute contains a semicolon (;), in which case the braces are required. If the attribute value that the driver receives includes braces, the driver should not remove them but they should be part of the returned connection string.

A DSN or connection string value enclosed with braces ({}) that contains any of the characters []{}(),;?*=!@ is passed intact to the driver. However, when you use these characters in a keyword, the Driver Manager returns an error when you work with file DSNs, but passes the connection string to the driver for regular connection strings. Avoid using embedded braces in a keyword value.

It follows that a braced value is only ended by } if it appears right before ; or at the end of the connection string, so basically:

auto braced  = '{'  >> *(char_ - ('}' >> (eoi | ';'))) >> '}';

To also retain the original bracing status (so the highlighted requirement can be met) I'd do this:

Live On Compiler Explorer

//#define BOOST_SPIRIT_X3_DEBUG
#include <boost/fusion/adapted.hpp>
#include <boost/spirit/home/x3.hpp>
#include <cstdio>
#include <iostream>
#include <map>

struct Value {
    bool braced;
    std::string value;
};
using Map = std::map<std::string, Value>;

BOOST_FUSION_ADAPT_STRUCT(Value, braced, value)

namespace Grammar {
  using namespace boost::spirit::x3;

  // only to coerce attribute type, no rule recursion needed anymore:
  template <typename T>
  auto as = [](auto p) { return rule<struct _, T>{"as"} = p; };

  auto key     = +~char_("=;");
  auto braced  = '{'  >> *(char_ - ('}' >> (eoi | ';'))) >> '}';
  auto raw     = *~char_(";");
  auto value   = as<Value>(matches[&lit('{')] >> (braced | raw));
  auto entry   = key >> '=' >> value;
  auto connstr = -entry % ';' >> eoi;
} // namespace Grammar

Map parseConnectionString(std::string_view sv) {
  Map m;

  if (!parse(sv.begin(), sv.end(), Grammar::connstr, m))
    throw std::runtime_error("Parse error");

  return m;
}

#include <iostream>
int main() {
    for (
        auto connectionString : {
            R"(DSN=dsnname)",
            R"(Driver={Microsoft Access Driver (*.mdb)};DBQ=c:\bin\Northwind.mdb)",
            R"(Driver={Microsoft Excel Driver (*.xls)};DBQ=c:\bin\book1.xls)",
            R"(Driver={Microsoft ODBC for Oracle};Server=ORACLE8i7;Persist Security Info=False;Trusted_Connection=Yes)",
            R"(Driver={Microsoft Text Driver (*.txt; *.csv)};DBQ=c:\bin)",
            R"(Driver={SQL Server};Server=(local);Trusted_Connection=Yes;Database=AdventureWorks;)",
            R"(ID=abcde;PARAM={this;{is;quoted})",
            R"(ID=abcde;PARAM={this;{i}s;}s;quoted})", // all fine even if unbalanced
            //
            R"(ID=abcde;PARAM={this;{is}};quoted})", // parse error because of early };
        })
    try {
        std::cout << connectionString << std::endl;
        for (auto& [k, v] : parseConnectionString(connectionString))
        {
            std::cout << " -> " << k << ": " << v.value << ""
                      << (v.braced ? " (braced)" : " (raw)") << std::endl;
        }
    } catch(std::exception const& e) {
        std::cout << " -> " << e.what() << std::endl;
    }
}

Which prints the expected outcome:

DSN=dsnname
 -> DSN: dsnname (raw)
Driver={Microsoft Access Driver (*.mdb)};DBQ=c:\bin\Northwind.mdb
 -> DBQ: c:\bin\Northwind.mdb (raw)
 -> Driver: Microsoft Access Driver (*.mdb) (braced)
Driver={Microsoft Excel Driver (*.xls)};DBQ=c:\bin\book1.xls
 -> DBQ: c:\bin\book1.xls (raw)
 -> Driver: Microsoft Excel Driver (*.xls) (braced)
Driver={Microsoft ODBC for Oracle};Server=ORACLE8i7;Persist Security Info=False;Trusted_Connection=Yes
 -> Driver: Microsoft ODBC for Oracle (braced)
 -> Persist Security Info: False (raw)
 -> Server: ORACLE8i7 (raw)
 -> Trusted_Connection: Yes (raw)
Driver={Microsoft Text Driver (*.txt; *.csv)};DBQ=c:\bin
 -> DBQ: c:\bin (raw)
 -> Driver: Microsoft Text Driver (*.txt; *.csv) (braced)
Driver={SQL Server};Server=(local);Trusted_Connection=Yes;Database=AdventureWorks;
 -> Database: AdventureWorks (raw)
 -> Driver: SQL Server (braced)
 -> Server: (local) (raw)
 -> Trusted_Connection: Yes (raw)
ID=abcde;PARAM={this;{is;quoted}
 -> ID: abcde (raw)
 -> PARAM: this;{is;quoted (braced)
ID=abcde;PARAM={this;{i}s;}s;quoted}
 -> ID: abcde (raw)
 -> PARAM: this;{i}s;}s;quoted (braced)
ID=abcde;PARAM={this;{is}};quoted}
 -> Parse error
sehe
  • 374,641
  • 47
  • 450
  • 633
  • Thank you for your suggestion and your effort giving an example. Your first assumption is correct. But the nested braces don't need to be balanced. In fact it is a password and thus can contain arbitrary characters. The real purpose is to parse an ODBC connection string to extract username, password and other connection infos. I don't have experience with Boost.Spirit yet, but I will look into it. – sigy Jul 25 '22 at 07:58
  • The example is is contradictory then. There is /no way/ we can make it parse the expected results _unless_ (a) we assume that the braces were balanced (as your given example misleadingly suggested) (b) you have a way to escape the quote (`}`) inside the quoted construct. This is classic [Pigeon Hole Principle](https://en.wikipedia.org/wiki/Pigeonhole_principle). – sehe Jul 25 '22 at 15:56
  • The good news is that without recursive grammar things are much simpler: e.g. or [using `}}` to denote a single `}`](https://godbolt.org/z/WvzGbGP5n) (like "%%" in printf) or [using generalized '\' escaping](https://godbolt.org/z/Waje6fhax). Obviously I recommend the latter – sehe Jul 25 '22 at 16:10
  • Unfortunately, I have no control over the grammar. This is how Windows stores ODBC connection strings. – sigy Jul 26 '22 at 08:58
  • Ah. You could have just told us. Going from the scant documentation [here](https://learn.microsoft.com/en-us/dotnet/api/system.data.odbc.odbcconnection.connectionstring?redirectedfrom=MSDN&view=netframework-4.7.2#System_Data_Odbc_OdbcConnection_ConnectionString:~:text=Applications%20do%20not,returned%20connection%20string) I'd do this: https://godbolt.org/z/nxE6MbaoW – sehe Jul 26 '22 at 12:44
  • Also updated the answer with some more explanation. – sehe Jul 26 '22 at 12:49
  • @sehe Regarding your last case which errors out. PARAM could be a password in ODBC conn string like `this;{is}};quoted`. In that case `PWD={{this;{is}};quoted}` should parse without error right? If yes, how can I modify the grammer to do so? – rahman Jan 11 '23 at 08:25
  • @sehe I meam your comment : `It follows that a braced value is only ended by } if it appears right before ; ...` Was it part of the document? or an inference? Also, shouldn't we consider occasional whitespaces in between? `PWD={{this;{is};quoted} ; PARAM=abc;` – rahman Jan 11 '23 at 08:32
  • 1
    I said "it follows" immediately after qouting the documentation which is _also_ linked. I could read it again, but may be you should state your reasons for drawing any other conclusion? – sehe Jan 11 '23 at 08:45
  • @sehe The linked document ends after `Avoid using embedded braces in a keyword value.`. Mine is actually more an inference/understanding from the document, not a conclusion. I am not sure if parsing `{{this;{is};quoted};` to `{this;{is};quoted` is a valid/statndard expectation. But 1.Suppose I am right. Is it even programmatically achievable? How? 2. About whitespaces, Do you also think `{{this;{is};quoted} ;` is not a valid expectation? if valid, how whould you modify your grammar to skip the whitespace? Thanks – rahman Jan 15 '23 at 15:42
  • It was an inference based on _"A DSN or connection string value enclosed with braces ({}) that contains any of the characters []{}(),;?*=!@ is passed intact to the driver"_. Just taking it lterally means that the order nor balancing should matter. – sehe Jan 15 '23 at 19:19
  • "But 1.Suppose I am right." - assuming about the PWD example, yes you can "make it so". Re: Also, shouldn't we consider occasional whitespaces in between? - I have no clue. There's no mention of it in the linked document. As a rule, make parsers as simple as possible. – sehe Jan 15 '23 at 19:25
  • 1
    Side note, on re-reading the document I notice that the optional `'{`/`'}'` characters are ONLY (badly) specified in the grammar production for `DRIVER` attribute, specifically. Both the poor grammar production *and* the contradiction with the more general _"A DSN or connection string value enclosed with braces ({})"_... I conclude that we don't know the specs, and I cannot really help you guess. If you have up your mind and need help with a specific interpretation/set of chosen requirements, go ahead and ask in a separate question. I'll be happy to help out with the implementation – sehe Jan 15 '23 at 19:25
  • 1
    Thanks @sehe. I also think there is a contradiction. One thing I am almost sure so far based on what I see in other odbc source codes, is that bracing values including special characters is not limited to `DRIVER`. I will revise my requirements again and ask the question in a separate question. – rahman Jan 15 '23 at 20:17
  • @sehe I created another question as you requested. Could you plz put your extended solution there. thanks:https://stackoverflow.com/questions/75212558/standard-odbc-connection-string-parseing-with-special-characters – rahman Mar 19 '23 at 11:51