3

Suppose I have a string

Max and Bob and Merry and {Jack and Co.} and Lisa.

I need to split it with and being the delimiter, but only if it does not occur within curly braces.

So from the above string I should get 5 strings:
Max, Bob, Merry, Jack and Co., Lisa.

I tried something like this pattern:

[^\\\{.+]\\band\\b[^.+\\\}]

But it doesn't work - Jack and Co. are still split as well (I use C++ so I have to escape special characters twice).

Meyer
  • 1,662
  • 7
  • 21
  • 20
Maximko
  • 627
  • 8
  • 20
  • 3
    Are lookaheads supported by the qregexp? If so, try [`\\band\\b(?![^{]*})`](https://regex101.com/r/0Fd1tF/1) might need more escaping. – bobble bubble Dec 11 '16 at 13:21
  • 1
    In C++, you can use a *raw string literal* for regular expressions, enclosed by `R"(` and `)"`. This way, backslashes can be used directly, i.e. `R"(\d*)"` – Meyer Dec 11 '16 at 13:29
  • You want to split with too many conditions, that sounds like matching with 2 steps can prove better: 1) extract what is inside braces with `QRegExp("\\{([^{}]*)\\}")` and 2) split with `"\\{[^{}]*\\}|\\s*\\band\\b\\s*"` – Wiktor Stribiżew Dec 11 '16 at 13:47
  • 1
    bobble bubble, thanks, that seems to work exactly as expected. (Yes, lookaheads are supported in QRegExp, and QRegularExpression supports lookbehinds as well). – Maximko Dec 11 '16 at 14:03

3 Answers3

2

If lookaheads are supported by the QRegExp you can check if inside braces by looking ahead at the final word boundary if there is a closing } with no opening { in between.

\band\b(?![^{]*})

See this demo at regex101

Need to be escaped as desired or try the raw string literal like @SMeyer commented.

bobble bubble
  • 16,888
  • 3
  • 27
  • 46
1

Here is a possible solution, partially based on the comment by bobble-bubble. It will produce the five strings as requested, without surrounding whitespace or curly brackets.

std::string text = "Max and Bob and Merry and {Jack and Co.} and Lisa";
std::regex re(R"(\}? +and +(?![^{]*\})\{?)");

std::sregex_token_iterator it(text.begin(), text.end(), re, -1);
std::sregex_token_iterator end;

while (it != end)
    std::cout << *it++ << std::endl;

I tried to keep it simple, you might want to replace the spaces around and with full whitespace detection. An interactive version is available here.

Community
  • 1
  • 1
Meyer
  • 1,662
  • 7
  • 21
  • 20
0

Let the {...} part match first. That is, put it on the left side of |.

\{.*?\}|and

That will match {foo and bar} if possible, but if not then it will try to match and.

Waxrat
  • 2,075
  • 15
  • 13