0

I have a string of text:

\n new"test \n aaaa" \n ta \n `this is a \n newline that should be kept`

My goal is to match all \n's outside of backticks (`), quotes ("), or single quotes ('). Based off another question (https://stackoverflow.com/a/48953880/14465957), I switched the positive lookahead used to a negative one, which now matches all newlines outside of quotes ("). However, it doesn't work when I attempted to ignore single and back ticks.

What am I doing wrong?

Working quotes: https://regex101.com/r/ooqz5d/1/

divinelemon
  • 1,925
  • 2
  • 23

2 Answers2

1

Use

text.replace(/("[^"\\]*(?:\\.[^"\\]*)*"|'[^'\\]*(?:\\.[^'\\]*)*'|`[^`\\]*(?:\\.[^`\\]*)*`)|\\n/g, '$1')

See regex proof.

EXPLANATION

--------------------------------------------------------------------------------
  (                        group and capture to \1:
--------------------------------------------------------------------------------
    "                        '"'
--------------------------------------------------------------------------------
    [^"\\]*                  any character except: '"', '\\' (0 or
                             more times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
    (?:                      group, but do not capture (0 or more
                             times (matching the most amount
                             possible)):
--------------------------------------------------------------------------------
      \\                       '\'
--------------------------------------------------------------------------------
      .                        any character except \n
--------------------------------------------------------------------------------
      [^"\\]*                  any character except: '"', '\\' (0 or
                               more times (matching the most amount
                               possible))
--------------------------------------------------------------------------------
    )*                       end of grouping
--------------------------------------------------------------------------------
    "                        '"'
--------------------------------------------------------------------------------
   |                        OR
--------------------------------------------------------------------------------
    '                        '\''
--------------------------------------------------------------------------------
    [^'\\]*                  any character except: ''', '\\' (0 or
                             more times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
    (?:                      group, but do not capture (0 or more
                             times (matching the most amount
                             possible)):
--------------------------------------------------------------------------------
      \\                       '\'
--------------------------------------------------------------------------------
      .                        any character except \n
--------------------------------------------------------------------------------
      [^'\\]*                  any character except: ''', '\\' (0 or
                               more times (matching the most amount
                               possible))
--------------------------------------------------------------------------------
    )*                       end of grouping
--------------------------------------------------------------------------------
    '                        '\''
--------------------------------------------------------------------------------
   |                        OR
--------------------------------------------------------------------------------
    `                        '`'
--------------------------------------------------------------------------------
    [^`\\]*                  any character except: '`', '\\' (0 or
                             more times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
    (?:                      group, but do not capture (0 or more
                             times (matching the most amount
                             possible)):
--------------------------------------------------------------------------------
      \\                       '\'
--------------------------------------------------------------------------------
      .                        any character except \n
--------------------------------------------------------------------------------
      [^`\\]*                  any character except: '`', '\\' (0 or
                               more times (matching the most amount
                               possible))
--------------------------------------------------------------------------------
    )*                       end of grouping
--------------------------------------------------------------------------------
    `                        '`'
--------------------------------------------------------------------------------
  )                        end of \1
--------------------------------------------------------------------------------
 |                        OR
--------------------------------------------------------------------------------
  \\                       '\'
--------------------------------------------------------------------------------
  n                        'n'

JavaScript code:

const text = String.raw`\nnew"test\naaaa\\\n"\nta\n\`this is a \nnewline that should be kept\`\n'this is a \nnew test'\n`
console.log(text.replace(/("[^"\\]*(?:\\.[^"\\]*)*"|'[^'\\]*(?:\\.[^'\\]*)*'|`[^`\\]*(?:\\.[^`\\]*)*`)|\\n/g, '$1'))
Ryszard Czech
  • 18,032
  • 4
  • 24
  • 37
1

If you're using PCRE, you can use a control verb to skip everything inside of a quote closure:

(['"`]).*?\1(*SKIP)(*F)|\\n
(['"`])         any type of quote, put it in group 1
.*?             any characters, non greedy
\1              the quote that captured in group 1
(*SKIP)(*F)     skip the current match, which is a quote closure
|\\n            match a \n

See the test cases


Also, if you need to ignore escaped quotes(\", \' etc), you may try

(['"`])(?:(?<!\\)\\(?:\\\\)*\1|(?!\1).)*\1(*SKIP)(*F)|\\n

Check the test cases


Using JavaScript

For JavaScript, you can't use control verbs. But you can use group capture to replace outbound \n

  • Regex
((['"`])[\s\S]*?\2)|\\n
  • Substitution
$1

const regex = /((['"`])[\s\S]*?\2)|\\n/g;

const text = String.raw`\nnew"test\naaaa"\nta\n\`this is a \nnewline that should be kept\`\ntest\n'this \n should also be kept'\n`;

console.log('before\n', text);

const result = text.replace(regex, '$1');

console.log('after\n', result);
  • Real line breaks

const regex = /((['"`])[\s\S]*?\2)|\n/g;

const text = `\nnew"test\naaaa"\nta\n\`this is a \nnewline that should be kept\`\ntest\n'this \n should also be kept'\n`;

console.log('before\n----\n', text);

const result = text.replace(regex, '$1');

console.log('after\n----\n', result);
Hao Wu
  • 17,573
  • 6
  • 28
  • 60
  • I'm accepting this as the answer because of the simplicity, an the added fact that you went the extra step with the escape quotes. Thanks so much for helping! – divinelemon Jun 04 '21 at 03:49
  • Just went to test this is my JS program and realized that it doesn't work (in JS). Is there a reason for this? Thanks! (regex101: https://regex101.com/r/Vv1fPi/1) – divinelemon Jun 04 '21 at 03:58
  • 1
    @divinelemon JavaScript regex engine doesn't support control verbs! You need to specify which regex flavour you are using when you ask a question about regex. – Hao Wu Jun 04 '21 at 04:33