1

I need to find a pattern in a string (e.g. character '}' or anything else) but this pattern may occur inside a quoted string and, naturally, I don't want my regex to capture it.

Example:

bla bla } bla bla      <-- Capture.
bla "bla bla" } bla    <-- Capture.
bla bla } "bla bla"    <-- Capture.
bla "bla } bla" bla    <-- DON'T capture.
bla } bla "bla } bla"  <-- Capture the first, but DON'T capture second.

I need to accomplish this using C++14 std::regex (so no lookbehind).

I gathered some inspiration in the links below but none fully solves my problem and I assume I am not being very clever to solve it myself:

As you can see, no much about C++ regexes and even considering PHP, Javascript, Perl, etc. I cannot find an answer.

Any help will be greatly appreciated. Thanks in advance.

j4x
  • 3,595
  • 3
  • 33
  • 64
  • I doubt you match them to collect extracted braces. Are you removing them or replacing with something? Another question is, do you need to support escaped quotes? What do you need to turn `bla \\"} bla "bla "bla } \"bla\""` into? – Wiktor Stribiżew Jul 06 '18 at 17:40
  • 2
    What do you want to do for (e.g.) `bla } bla "bla } bla" bla } bla`? Capture 1st and 3rd?. In that case, you _may_ have to loop/parse even with a regex that do the quoted/non-quoted case ala your first link – Craig Estey Jul 06 '18 at 17:46
  • If you need to remove those patterns outside of quotes in strings without escape sequences, it becomes a very simple task. – Wiktor Stribiżew Jul 06 '18 at 17:47
  • Thanks for your answer @WiktorStribiżew. This pattern simply tells me to stop, I don't really need to process it (no removes nor replacements). Well... there can actually be escaped quotes _inside_ quoted text, but not outside. – j4x Jul 06 '18 at 17:50
  • 1
    Aha, then I suggest doing it like this: 1) remove all substrings inside quotes, try a simple `regex_replace` with `std::regex reg("\"[^\"]*\"")` pattern, and then 2) check if `{` or `}` is present in the resulting string. If it is a single char, you do not even need a regex for the second operation. – Wiktor Stribiżew Jul 06 '18 at 17:52
  • Thanks @CraigEstey. Yes, you're correct. I'd have two matches that I could iterate over if needed (but actually I don''t need - I just need to know they occur). The point is to reach a regex that matches my pattern only if not enclosed in quotes. – j4x Jul 06 '18 at 17:52
  • I'll give your suggestion a try. I am only cautious about performance since this code must run an a somewhat constrained embedded system. I'll tell you in minutes. Thanks! – j4x Jul 06 '18 at 17:55
  • 3
    Does this **have** to be regex? Why don't you just walk the text 1 char at a time? – Tezra Jul 06 '18 at 17:55
  • I would avoid regex in this case if it's a constrained system. Regex does not support context based matching, and hacking it to do so will be far more expensive than normal. You should stream the text, and parse it yourself. (Or with a parse API that supports streaming if you are using a standard format) – Tezra Jul 06 '18 at 17:59
  • Could you extract a vector of pointers to the regions of the text that occur **outside** quotes (easier to detect) and then search the text regions pointed to in that vector? – Galik Jul 06 '18 at 19:16

0 Answers0