1

I would like to find all occurrences of /".*?"/ except when wrapped in inline code (single backtick) or code block (triple backtick).

This is what I have so far (doesn’t work as expected).

/(?<!(`|```))".*?"(?!\1)/g

In the following markdown snippet, I would like to only find "rabbit hole". Unfortunately, I cannot include an example of a code block (I don’t know how to escape nested triple backticks), but the same logic applies.

When copy/pasting commands that start with `cat << "EOF"`, select all lines at once (from `cat << "EOF"` to `EOF` inclusively) as they are part of the same (single) command

Figuring out how to ignore the above quoted strings is an interesting "rabbit hole".
sunknudsen
  • 6,356
  • 3
  • 39
  • 76
  • @WiktorStribiżew Actually, I have been using regex for years... somehow I am not able to figure out this use case. – sunknudsen Oct 06 '20 at 13:45
  • @WiktorStribiżew Please consider reopening the question. – sunknudsen Oct 06 '20 at 13:46
  • Without seeing your *code*, the regex you added does not help understand what you are doing. Splitting? Replacing? Wrapping with some other text? Extracting? What kind of text is the input? – Wiktor Stribiżew Oct 06 '20 at 13:46
  • @WiktorStribiżew I just updated the question to include an example. Thanks for your help! – sunknudsen Oct 06 '20 at 13:53
  • Sorry, still not clear a bit: so you want to *extract*, right? See ``(?<!`)(`(?:`{2})?)(?:(?!\1).)*?\1|"([^"]*)"`` [demo](https://regex101.com/r/FXngBO/1). You need Group 2. – Wiktor Stribiżew Oct 06 '20 at 13:58
  • @WiktorStribiżew I am trying to write a script to convert dump quotes to smart quotes. To achieve this, I need to find all quoted content that is not part of inline code or code blocks. – sunknudsen Oct 06 '20 at 14:06
  • @WiktorStribiżew In the demo, only "rabbit hole" should match as I am trying to ignore matches wrapped in inline code or code blocks. – sunknudsen Oct 06 '20 at 14:09
  • You can't just ignore these matches in between identical right/left-hand delimiters, you need to match and **consume** them, then match and capture what you need. That is why I asked what you are doing, it is impossible to answer your question without these details. So, I understand you are replacing. So, try ``text.replace(/(?<!`)(`(?:`{2})?)(?:(?!\1).)*?\1|"([^"]*)"/g, (x,y,z) => z ? `‘${z}’` : x)`` – Wiktor Stribiżew Oct 06 '20 at 14:13
  • Thanks for your help @WiktorStribiżew. So there is no way to ignore `/".*?"/g` matches when they are found between a single backtick or triple backticks using only regex? – sunknudsen Oct 06 '20 at 14:20
  • 1
    It is not possible with *JavaScript* regex. [It is possible](https://regex101.com/r/FXngBO/2) with PCRE and Python PyPi `regex` library. – Wiktor Stribiżew Oct 06 '20 at 14:43
  • Thanks a lot for your help @WiktorStribiżew. – sunknudsen Oct 06 '20 at 15:54
  • What shall we do with the question? Is the [solution I suggested](https://stackoverflow.com/questions/64225928/how-to-find-quoted-strings-except-when-wrapped-in-inline-code-or-code-block?noredirect=1#comment113574781_64225928) acceptable? – Wiktor Stribiżew Oct 06 '20 at 17:00
  • @WiktorStribiżew I believe an answer that includes the subtleties you shared (including the fact it cannot be done using JavaScript regular expressions) would be of interest to others. Looking at your answers (both JavaScript and PCRE alternatives), solving this problem requires advanced regular expression knowledge. Not sure how large the audience is for that, but it helped me, so thanks! – sunknudsen Oct 06 '20 at 17:23

1 Answers1

1

It is not possible to match just the strings between double quotes outside of single or triple backticks with plain JavaScript regex. It is possible with PCRE and Python PyPi regex library because they support the (*SKIP)(*F) construct.

In JavaScript, you can join the regex and the code to get what you need:

text.replace(/(?<!`)(`(?:`{2})?)(?:(?!\1).)*?\1|"([^"]*)"/g, 
      (x,y,z) => z ? `‘${z}’` : x)

See the regex demo, once the z is matched (Group 2), the match is valid and you may replace the quotes with curly quotes, else, return x, the whole match value.

Regex details

  • (?<!`)(`(?:`{2})?)(?:(?!\1).)*?\1:
    • (?<!`) - no backtick immediately to the left is allowed
    • (`(?:`{2})?) - Group 1: a backtick and then an optional double backtick sequence
    • (?:(?!\1).)*? - any char other than a line break char, zero or more occurrences but as few as possible, that does not start the same char sequence that is captured in Group 1
    • \1 - the same char sequence that is captured in Group 1
  • | - or
  • "([^"]*)" - ", Group 2: any zero or more chars other than ", and then a ".
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563