3

As far as matching a newline in a string - is /[\r\n]/ the same as /[\n]/ ?

I was reading this blog post: https://davidwalsh.name/remove-multiple-new-lines

and it says to use /[\r\n]/, but I am simply wondering if that always matches the same as /[\n]/, or if they are sort of a venn diagram where each might match something different.

  • 2
    very simple answer: if the letters are different, the regex is different. `\r` and `\n` are different characters (carriage return vs. newline), so just like `[ab]` and `[b]` are different patterns, the two patterns you show do different things. – Mike 'Pomax' Kamermans Sep 06 '18 at 03:08
  • ah yes, so `/\r\n/` is different than `/[\r\n]/`, in that way, if you can add answer that would help. –  Sep 06 '18 at 03:10
  • 1
    that is not your question. Your question is about `[\r\n]` vs `[\n]`, not about using [character classes](https://www.regular-expressions.info/refcharclass.html) vs. not using character classes. – Mike 'Pomax' Kamermans Sep 06 '18 at 03:10
  • 1
    @Mike'Pomax'Kamermans you had it right the first time, you understood my confusion - your first comment is right. When I first asked the question, I forgot about the `[]` braces. –  Sep 06 '18 at 03:11
  • 2
    give https://www.regular-expressions.info/ a read-through. It is _the_ place anyone with questions about regex should hit up first (once they know it exists, of course) – Mike 'Pomax' Kamermans Sep 06 '18 at 03:12

2 Answers2

5

No, they're not the same thing. \r matches a carriage return (CR), while \n matches a line feed (LF). In certain environments, new lines are usually designated by \r\n (like Windows), while in others, new lines are designated by \n only (like Unix). They're separate characters.

Here's an example:

const file = 'line\r\nline2';
const file2 = 'line\nline2';

console.log(file.replace(/[\n]/g, '\nNEW LINE:\n')); // one replacement
console.log(file2.replace(/[\n]/g, '\nNEW LINE:\n')); // one replacement

console.log(file.replace(/[\r\n]/g, '\nNEW LINE:\n')); // two replacements
console.log(file2.replace(/[\r\n]/g, '\nNEW LINE:\n')); // one replacement

As you can see, you cannot just use [\n] instead of [\r\n] - the output is different, they will match different parts of a string.

Often, when you want to match a generic new line in an unknown format, you can use

\r?\n

to match the carriage return if it exists, followed by the line feed.

As comment notes, it's true that old Macs used \r only, so to match those as well, you could lookahead for \r or \n and then match:

(?=\r|\n)\r?\n?
CertainPerformance
  • 356,069
  • 52
  • 309
  • 320
  • 1
    AFAIK Mac is using '\r' for Newline, it will not match your regex. To always get a match I use: ([\r\n]|\n|\r) – Poul Bak Sep 06 '18 at 03:26
  • And of course a character class containing only a single character doesn't need the character class. So `/[\n]/` is equivalent to `/\n/` but obviously `/[\r\n]/` is not equal to `/\r\n/` (the former matches only a single character whilst the latter matches two characters in a particular sequence). – tripleee Sep 06 '18 at 04:07
  • @PoulBak `\r` applies mostly to the classic Mac and old versions of Mac OS X. Se also [this at superuser](https://superuser.com/questions/439440/did-mac-os-lion-switch-to-using-line-feeds-lf-n-for-line-breaks-instead-of). Some GUI apps may still use CR for linebreaks but anything Terminal related uses `\n`. – wp78de Sep 06 '18 at 04:12
  • Great, learned something new today (never worked with Mac, just read it somewhere). – Poul Bak Sep 06 '18 at 19:02
0

The answer is, as often, it depends.

In general, \n and \r are not the same. Traditionally, in regex engines

  • \n maps on most platforms (including Unix, DOS/Windows) to the ASCII LF character. On (classic) Mac OS systems (and old OS X versions), maps to the ASCII CR character.

  • \r in turn, maps to the ASCII CR character, but on (old) Mac OS systems to LF.

As time goes on, the old Mac-style tends to become irrelevant. To prove this at least in part, here is a browser shot running Safari 9.1 on Mac OS 10.8 that matches \r (result) and \r?\n, (result) against a single line break - only when \n is present in the regex there is a match.

However, there are still exceptions in JavaScript. For instance, if you define a multiline string using a template literal you always get a line feed - regardless of the OS-specific new-line definition. Explanation.

Nonetheless, if you define a string literal like '\r\n' in your source code, or read text from a file stream that contains OS-specific new-lines, etc. you have to deal with it.

To answer your initial quest,

\r?\n

usually is a safe bet to remove excess new lines.

Or, if really have to deal with the old Mac-style use \r\n?|\n

wp78de
  • 18,207
  • 7
  • 43
  • 71