4

I want to write a regex what matches everything except words between quotes. Ex.:

 Lorem ipsum "dolor" sit amet, consectetur "adipiscing" elit.
 Nunc ultrices varius odio, "ut accumsan nisi" aliquet vitae.
 "Ut faucibus augue tortor, at aliquam purus dignissim eget."

So I want a regex what matches the following strings:

  • Lorem ipsum
  • sit amet, consectetur
  • elit. Nunc ultrices varius odio,
  • aliquet vitae.

I only have the following expression that matches substrings inside quotes:

([\"'])(?:\\\1|.)*?\1
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
Gotrank
  • 199
  • 1
  • 7

2 Answers2

2

This regex works:

([^"]+?)(".*?"|$)

https://regex101.com/r/um9TEx/3

1st Capturing Group ([^"]+?)
Match a single character not present in the list below [^"]+?
+? Quantifier — Matches between one and unlimited times, as few times as possible, expanding as needed (lazy)
" matches the character " literally (case sensitive)
" matches the character " literally (case sensitive)
.*? matches any character (except for line terminators)
*? Quantifier — Matches between zero and unlimited times, as few times as possible, expanding as needed (lazy)
" matches the character " literally (case sensitive)
Duc Filan
  • 6,769
  • 3
  • 21
  • 26
2

If you are using PCRE, you may use

([\"'])(?:\\.|(?!\1)[^\\])*?\1(*SKIP)(*F)|(?:[^\\"']|\\.)+

See its demo.

Details

  • ([\"'])(?:\\.|(?!\1)[^\\])*?\1 - a "..." or '...' substring with escaped quote support:
    • ([\"']) - Group 1 (referred to with \1): a " or '
    • (?:\\.|(?!\1)[^\\])*? - 0+ occurrences (as few as possible due to *? being lazy) of:
      • \\. - an escape sequence
      • | - or
      • (?!\1)[^\\] - any char other than \ and the quote char in Group 1
    • \1 - Same value as in Group 1 (" or ')
  • (*SKIP)(*F) - PCRE verbs that omit the current match and make the engine proceed to the next match from the current match end position
  • | - or
  • (?:[^\\"']|\\.)+ - 1 or more occurrences of:
    • [^\\"'] - a char other than \, ' or "
    • \\. - an escape sequence.
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • So anyone doesn't take forever to figure this part out everything before the | just discards everything inside quotes. I didn't even know you could discard matches with regex. – Dean Or Jul 29 '19 at 22:44