6

Is it possible to match text outside quotation marks by using standard regex parser? I have seen this answer, but it is done by using PCRE:

Can regex match all the words outside quotation marks?

This is not a pure solution because of using PERL. I know that it also can be solved by using programming language, but the idea is to use pure regex parser.

I have made something like this, but this is not working correctly

[^'"]*(?=(?:(['"])+(.*?\1))|([^'"]*$))

Thank you in advance.

UPD1:The idea is to match any kind of text outside quotation marks, the solution must not depend on the input.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
dzharvis
  • 101
  • 1
  • 7
  • 1
    I don't believe there's a single regex that would work for *all* the languages. There will be at least one platform that doesn't support a particular feature. – Amal Murali Oct 28 '14 at 13:38
  • __Warning:__ don't use a regex to write a parser for a programming language unless you know _exactly_ what you are doing. Common pitfalls: (1) `"` inside a code comment mistaken for the start of a string literal. (2) `/*` or `//` inside a string literal mistaken for the start of a code comment. (3) `\"` inside a string literal mistaken for the end of the string literal. (4) Failure to recognize tokens inside the placeholders of an [interpolated string](https://en.wikipedia.org/wiki/String_interpolation). – Ruud Helderman Aug 23 '22 at 14:22

3 Answers3

10
<yourtext>(?=(?:[^"]*"[^"]*")*[^"]*$)

Yes you can do it in using positive lookahead.But this assumes you have balanced " and there is no stray " lying somewhere.See demo.

http://regex101.com/r/sU3fA2/29

vks
  • 67,027
  • 10
  • 91
  • 124
4

I came up with this solution:

(?:[^"](?=(?:[^"]*?(?:["][^"]*?["][^"]*?)+$)|(?:[^"]*?$)))*|(^[^"]*["][^"]*$)

http://regex101.com/r/pI8xA4/2

it will not work very well if we have an odd number of quotes - In this case, it will skip the first quote. But it is the best solution for me for now.

wp78de
  • 18,207
  • 7
  • 43
  • 71
dzharvis
  • 101
  • 1
  • 7
2

This pattern will capture words outside double quotes

"[^"]+"|(\S+) 

Demo

or this pattern to capture sentences outside double quotes, you would have to trim extra spaces

"[^"]+"|([^"]+)

Demo

alpha bravo
  • 7,838
  • 1
  • 19
  • 23