1

I have a string like below;

text = "\n first \n second \n third"

I want to split this string on new line character and keep the delimiter (\n and \r\n). So far I tried this text.split( /(?=\r?\n)/g ) The result is like below:

["↵ first ", "↵ second ", "↵ third"]

But I want this:

["↵", " first ↵", " second ↵", " third"]

What is the correct Regex for that?

Reza
  • 3,473
  • 4
  • 35
  • 54
  • 2
    If your environment supports lookbehind (=if it supports ECMAScript 2018), use `text.split(/(?<=\r?\n)/)` – Wiktor Stribiżew Nov 05 '18 at 20:59
  • Possible duplicate of [Javascript and regex: split string and keep the separator](https://stackoverflow.com/questions/12001953/javascript-and-regex-split-string-and-keep-the-separator) – zfrisch Nov 05 '18 at 21:00
  • @WiktorStribiżew Unfortunately it doesn't support – Reza Nov 05 '18 at 21:05

4 Answers4

2

Your JavaScript version might not support lookbehinds. But here is a trick we can use which avoids them:

text = "\n first \n second \n third"
text = text.replace(/\n/mg, "\n\n");
terms = text.split(/\n(?!\n)/);
console.log(terms);

This works by replacing every newline \n with two of them \n\n, and then splitting on \n(?!\n). That is, after making this replacement, we split on \n which is not followed by another newline character. This results in consuming the second newline during the split, while retaining the first one which we want to appear in the output.

Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
  • Sounds great but it's kind of small hack. Is there any other good way? – Reza Nov 05 '18 at 21:04
  • @Reza Yes, read Wiktor's comment above, and see if you can use a lookbehind to check for a `\n` prior to the split. If you _can't_ use lookarounds, then you'll need some alternative, such as my answer. – Tim Biegeleisen Nov 05 '18 at 21:05
2

You may match any text up to an CRLF or LF or end of string:

text.match(/.*(?:$|\r?\n)/g).filter(Boolean)
// -> (4) ["↵", " first ↵", " second ↵", " third"]

The .*(?:$|\r?\n) pattern matches

  • .* - any 0 or more chars other than newline
  • (?:$|\r?\n) - either end of string or an optional carriage return and a newline.

JS demo:

console.log("\r\n first \r\n second \r\n third".match(/.*(?:$|\r?\n)/g));
console.log("\n first \r\n second \r third".match(/.*(?:$|\r?\n)/g));
console.log("\n\n\n first \r\n second \r third".match(/.*(?:$|\r?\n)/g));

For ECMAScript 2018 standard supporting JS environments, it is as simple as using a lookbehind pattern like

text.split(/(?<=\r?\n)/)

It will split at all positions that immediately follow an optional CR + LF symbol.

Another splitting regex is /^(?!$)/m:

console.log("\r\n first \r\n second \r\n third".split(/^(?!$)/m));
console.log("\n first \r\n second \r third".split(/^(?!$)/m));
console.log("\n\n\n first \r\n second \r third".split(/^(?!$)/m));

Here, the strings are split at each position after a CR or LF that are not at the end of a line.

Note you do not need a global modifier with String#split since it splits at all found positions by default.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • sounds great :) – Reza Nov 05 '18 at 21:10
  • this ( /^(?!$)/m ) will fail if you have two consequent \n like "\n\n first \n second \n third" – Reza Nov 05 '18 at 22:20
  • @Reza No, see the third example in the last demo. Or do you mean there should be 3 separate `\n` items? Then the first solution is the most robust. – Wiktor Stribiżew Nov 05 '18 at 22:21
  • 1
    The result of third example is  ["↵↵↵", " first ↵", " second ", " third"] while it should be this ["↵", "↵", "↵", " first ↵", " second ↵", " third"] like the result of pattern in ECMA 2018 – Reza Nov 05 '18 at 22:25
2

You could match on [^\n]*\n? (enabling g flag):

text = "\n\n first \n\n sth \r with \r\n second \r\n third \n forth \r";
console.log(text.match(/[^\n]*\n?/g));

You may need to .pop() the returning values because the last value always is an empty string:

var matches = text.match(/[^\n]*\n?/g);
matches.pop();
revo
  • 47,783
  • 14
  • 74
  • 117
  • The question asks for `\n` in both title and context and not `\r\n`. I only see `\r` in the OP's regex which is probably there by accident. – revo Nov 05 '18 at 21:19
  • @revo it would be great if your answer contain \r – Reza Nov 05 '18 at 21:21
  • this ( /(?=.)^/mg ) will fail if you have two consequent \n like "\n\n first \n second \n third" – Reza Nov 05 '18 at 22:21
  • @Reza Updated. That's the simplest right way to achieve it. – revo Nov 06 '18 at 13:19
0

You can use this simple regex:

/.*?(\n|$)/g

It will match any number of any char including Newline '\n or end of string.

You can access the matches as an array (Works like splitting but keeps the separator in the match).

Poul Bak
  • 10,450
  • 5
  • 32
  • 57