1

While trying to figure out how to parse ODBC connection string containing special characters([]{}(),;?) I'd like to know if what I am trying to achieve is a valid requirement or not. The SQLDriverConnect reads:

A DSN or connection string value enclosed with braces ({}) containing any of the characters []{}(),;?*=!@ is passed intact to the driver.

To me, it means special characters([]{}(),;?) are allowed including curly braces and semicolon in between (at least in values).

So, is it valid to expect PWD key(for example) look like this PWD={a{b}};c};?

And should its value be parsed to a{b}};c?

And if yes, can such a requirement be even easily achieved? Inspired by this solution my code so far:

#include <iostream>    
#include <istream>
#include <string>
#include <vector>

enum class CSVState {
    UnquotedField,
    QuotedField
};

std::vector<std::string> readCSVRow(const std::string &row) {
    CSVState state = CSVState::UnquotedField;
    std::vector<std::string> fields {""};
    size_t i = 0; // index of the current field
    int depth = 0;

    for (int ii = 0; ii < row.size(); ++ii) {
        auto& c = row[ii];
        switch (state) {
            case CSVState::UnquotedField:
                switch (c) {
                    case ';': // end of field
                              fields.push_back(""); i++;
                              break;
                    case '{': state = CSVState::QuotedField;
                              depth++;
                              break;
                    default:  fields[i].push_back(c);
                              break; 
                }
                break;
            case CSVState::QuotedField:
                switch (c) {
                    case '{': //state = CSVState::QuotedQuote;
                              depth++;
                              fields[i].push_back(c);
                              break;
                    case '}': 
                              depth--;
                              if (depth == 0) {
                                state = CSVState::UnquotedField;
                              } else {
                                fields[i].push_back(c);
                              }
                              break;
                    default:  fields[i].push_back(c);
                              break; 
                }
                break;
        }
    }
    std::cout << "fields: " << fields.size() << std::endl;
    return fields;
}

/// Read CSV file, Excel dialect. Accept "quoted fields ""with quotes"""
std::vector<std::vector<std::string>> readCSV(std::istream &in) {
    std::vector<std::vector<std::string>> table;
    std::string row;
    std::getline(in, row);
    if (in.bad() || in.fail()) {
        std::cout << "bad\n";
        return {};
    }
    auto fields = readCSVRow(row);
    table.push_back(fields);
    for(auto& f : fields) {
        std::cout << "'" << f << "' ";
    }
    std::cout << std::endl;
    }
    return table;
}

int main() {
     auto res = readCSV(std::cin);
}

For input {a{b}};c}; it will produce 'a{b}' 'c}' '' while I think it should be modified to produce a{b}};c}.

Any clues how to do that?

Cosmin
  • 21,216
  • 5
  • 45
  • 60
rahman
  • 4,820
  • 16
  • 52
  • 86

1 Answers1

1

The paragraph you mentioned refers to attributes, not strings. So for the string {a{b}};c}; the attributes separated by semicolon are {a{b}}, c} and ''.

A DSN or connection string value enclosed with braces

The key word here is value, meaning attribute value in the string.

So any attribute value can be enclosed in braces to be passed as is, but not the entire string.

Edit: If you want to parse values in braces no matter what they contain you can simply check the depth when encountering a semicolon:

case ';': // end of field
    if (depth == 0) {
        fields.push_back(""); i++;
    } else {
        fields[i].push_back(c);
    }
    
    break;

This way {a{b}};c}; will be parsed as {a{b}};c} and '' because only the semicolon that is not in braces is used.

Cosmin
  • 21,216
  • 5
  • 45
  • 60
  • The point is, the first semicolon is meant to be part of the attribute value. In the end, one needs to extract the whole password as: `a{b}};c` – rahman Jan 11 '23 at 07:33
  • @rahman, I think I understand what you want. Please see my edit. – Cosmin Jan 11 '23 at 21:51