What you're looking for is regex recursion. That's not supported by C++'s regex
engine (ECMAScript). So if you're going to parse a string that has recursion in C++, you'll either need Boost or you'll have to do it by hand.
Since I'd always encourage using the language where possible, I'll show you how to do this without Boost.
We'll need 2 functions, first one to find a non-escaped char
:
template <typename T>
T findNonEscaped(T start, T end, const char ch) {
T result = find(start, end, ch);
while (result != end && result[-1] == '\\') result = find(start, end, ch);
return result;
}
And second we'll need a function like this to extract nested parenthesis:
template <typename T>
T extractParenthesis(T start, T end) {
T finish = findNonEscaped(start, end, ')');
for (auto i = findNonEscaped(next(start), end, '('); i != end && i < finish; i = findNonEscaped(next(i), end, '(')) finish = findNonEscaped(next(finish), end, ')');
return finish;
}
Finally, given the input line: const auto input = "data1(\" value 1 \") data2 (\"value 2\") anything3(\" data3(\"value\") \")"s;
we can use those 2 functions to write this:
map<string, string> output;
for (auto openParenthesis = findNonEscaped(input.cbegin(), input.cend(), '('), closeParenthesis = input.cbegin(); openParenthesis != input.cend(); openParenthesis = findNonEscaped(openParenthesis, input.cend(), '(')) {
decltype(output)::key_type key;
istringstream ss{ string{ make_reverse_iterator(openParenthesis), make_reverse_iterator(closeParenthesis) } };
ss >> key;
closeParenthesis = extractParenthesis(openParenthesis, input.cend());
output[decltype(output)::key_type{ key.crbegin(), key.crend() }] = decltype(output)::mapped_type{ next(findNonEscaped(next(openParenthesis), closeParenthesis, '"')), prev(findNonEscaped(make_reverse_iterator(closeParenthesis), make_reverse_iterator(next(openParenthesis)), '"').base()) };
openParenthesis = closeParenthesis;
}
Live Example
This code is pretty resilient, the only defect I know of is that for an invalid input like const auto input = "key1(\"value1\"\"value2\")"
it will return:
key1 : value1""value2
I know some of this iterator functionality is a bit more... advanced. So if you have specific questions feel free to let me know in the comments.