0

I have such text, which contains a list pair, and every pair contains a list of some data and the connection of that data to some other data:

{data={data1,data2},connection=data3}, {{data4},data5}, {{data6},data7}, ...

I need to extract the data entry from each pair, i.e. I need data1, data2, data3 etc. This is the regexp I came up with:

(\{(?:\S*=)?\{\S+\})(?:,(?:\S*=)?\}\S+)?

regex101.com matches the pattern to the text and separates the string into these groups:

{data={data1,data2},connection=data3}, {{data4},data5}, {{data6},data7} (for each of which I'll need to run another regexp). However, my C++ doesn't doesn't match the string:

#include <iostream>
#include <string>
#include <regex>

int main()
{
    std::string text{ "{data={data1,data2},connection=data3}, {{data4},data5}, {{data6},data7}" };

    const std::regex rx{ R"data((\{\S*\{\S+\})(?:,(\S*=)?\}\S+)?)data" };
    std::smatch matches;
    if (std::regex_match(text, matches, rx))
    {
        std::cout << matches.size() << std::endl;
    }

    system("pause");
}

How should I do this in C++?

user3132457
  • 789
  • 2
  • 11
  • 29
  • 2
    [std::regex_search](https://en.cppreference.com/w/cpp/regex/regex_search) instead of [std::regex_match](https://en.cppreference.com/w/cpp/regex/regex_match). – Jarod42 Jan 30 '19 at 17:21
  • Now it returns only the first group, and no matter how many pairs I have in text, `matches.size()` is 3. – user3132457 Jan 30 '19 at 17:29
  • Use `sregex_token_iterator`, I added the second link. – Wiktor Stribiżew Jan 30 '19 at 17:49
  • @WiktorStribiżew I'm still getting only the first group of results: https://wandbox.org/permlink/tsKVcCEK7ROSJCBO – user3132457 Jan 30 '19 at 17:56
  • Try `const std::regex rx{ R"(\w+(?=[^{}=]*}))" };` – Wiktor Stribiżew Jan 30 '19 at 18:18
  • With that the application crashes... – user3132457 Jan 30 '19 at 18:22
  • Well, https://ideone.com/fzh120 does not. However, C++ regex implementations differ, and it may crash due to various reasons. – Wiktor Stribiżew Jan 30 '19 at 18:38
  • Fair. But I not only need the `data`s, I also need to separate them into groups, so that I know which data belongs to which group (and to save the connections among them). So what I need is more like my regexp. How should I modify mine to get that result? – user3132457 Jan 30 '19 at 18:44
  • It is not possible with 1 regex pass. You need to get your groups first, something like with `\{[^{}]*(?:\{[^{}]*\}[^{}]*)*\}` ([demo](https://regex101.com/r/ZfAsTk/1)), and then use that other regex to get each separate value from the matches. – Wiktor Stribiżew Jan 30 '19 at 18:51
  • Yeah that's what I need actually. I found a shorter one: `std::regex rx{ R"(\{\s*(\S+)\})" }`. By the way, here I'm not capturing `{` and `}` , why do the results have them? – user3132457 Jan 30 '19 at 18:59
  • `\S` matches both `{` and `}`. – Wiktor Stribiżew Jan 30 '19 at 19:08
  • I see. How can I capture only what's inside the curly braces? (not super important, I just wanna know) – user3132457 Jan 30 '19 at 19:12
  • It is as easy as `\{([^{}]*)\}` - if there is no balanced nested braces. If you need recursion, you can only use something like I showed above: 1 or 2 levels deep are feasible, but it it become unwieldly with more depth levels. – Wiktor Stribiżew Jan 30 '19 at 20:01
  • That only captures the `data` parts, but not second part of the pair – user3132457 Jan 31 '19 at 06:50

0 Answers0