12

I am trying to construct a regex that will match a solitary newline character (\n).

Similarly, I need another regex to match double newlines (\n\n) that are not part of a longer run of newline characters like \n\n\n or \n\n\n\n\n\n etc.

\n(?!\n) and \n\n(?!\n) match too much (they match the last newline(s) in a longer sequence of newlines). What can I do instead?

Tim Pietzcker
  • 328,213
  • 58
  • 503
  • 561
  • Well, I don't like such a comment. Why on earth you think I haven't tried anything. –  Aug 02 '13 at 07:48
  • 1
    @KenOKABE. How on earth we would know that you tried something, unless you show us what you tried? You have been on `SO` for almost 2 years now. You should be knowing the rules. – Rohit Jain Aug 02 '13 at 07:49
  • 2
    @KenOKABE: If you don't like those comments [check the help-center on how to avoid them](http://stackoverflow.com/help/asking): A good question shows what you've tried already, and where you've looked. This avoids other users posting answers with what you've already tried. – Elias Van Ootegem Aug 02 '13 at 07:53
  • Well, I didn't know I need to prove my effort to ask something here. I have never asked anything here without struggling for hours. –  Aug 02 '13 at 07:55
  • 1
    I had the same question a while ago: http://stackoverflow.com/questions/10319696/match-exactly-n-repetitions-of-the-same-character , but the answer involves lookbehinds which JS doesn't support. So I'm afraid there's no single regexp for this in JS. – georg Aug 02 '13 at 08:09
  • 1
    @KenOKABE: If you have struggled for hours, it's a good idea to show what you've tried for several reasons: 1.: the SO community can see that you're not one of those drive-by "do my work for me" askers that nobody likes (and unfortunately, your question in its current state makes you look like one of those. Snide comments don't improve this impression). 2.: More importantly, it gives us a chance to explain *why* your attempts failed. This in turn gives you and everyone else who reads this question a chance to understand the problem better. 3.: You get more upvotes for your question. – Tim Pietzcker Aug 02 '13 at 09:27
  • Thanks. Well, I understand your suggestion. I usually do what you say. See http://stackoverflow.com/questions/17715208/where-exactly-does-the-performance-advantage-of-lazyevalutaion-emerge-from for one. When it comes to regex, I rather have no clue, and feels pointless to present my random thought. Having said that, I respect what happened, and should improve things. –  Aug 02 '13 at 09:48

3 Answers3

15

Since JavaScript doesn't support lookbehind assertions, you need to match one additional character before your \n`s and remember to deal with it later (i.e., restore it if you use the regex match to modify the original string).

(^|[^\n])\n(?!\n)

matches a single newline plus the preceding character, and

(^|[^\n])\n{2}(?!\n)

matches double newlines plus the preceding character.

So if you want to replace a single \n with a <br />, for example, you have to do

result = subject.replace(/(^|[^\n])\n(?!\n)/g, "$1<br />");

For \n\n, it's

result = subject.replace(/(^|[^\n])\n{2}(?!\n)/g, "$1<br />");

See it on regex101

Explanation:

(       # Match and capture in group number 1:
 ^      # Either the start of the string
|       # or
 [^\n]  # any character except newline.
)       # End of group 1. This submatch will be saved in $1.
\n{2}   # Now match two newlines.
(?!\n)  # Assert that the next character is not a newline.
Tim Pietzcker
  • 328,213
  • 58
  • 503
  • 561
  • 1
    Yes, nice stuff, however, what if we want to collect the matches in an array rather than replacing them? Can you do that without a helper function? – georg Aug 02 '13 at 09:04
  • Thank you. I respect thg435 survey on this issue, and looking forward to further discussion. Having said that, I need this technique to replace. Thank you very much. How about single \n ? Do I miss something very basic?? –  Aug 02 '13 at 09:07
  • @thg435: Why would you want to match a series of identical strings in an array? You already know that a match consists of two newlines (discounting the extra character) :) – Tim Pietzcker Aug 02 '13 at 09:08
  • @KenOKABE: There is a regex for single `\n`s in my answer. Is that not what you wanted? – Tim Pietzcker Aug 02 '13 at 09:09
  • @TimPietzcker: yes, but I'm looking for a general answer. For example, find all groups of exactly two letters in a string - is it possible without a helper? – georg Aug 02 '13 at 09:12
  • Tim, right. That is not I'm looking for. if we use \n on your regex101, the result would be -> Don't change thisTESTDo change thatTESTTEST(two newlines) but don't change thisTESTTESTTESTbecause three newlines follow -< need to filter out TESTTEST or TESTTEST things. because what needed is a single \n –  Aug 02 '13 at 09:18
  • 1
    @KenOKABE: I think you have misunderstood. I *have* provided a regex that will match single `\n`s in my answer above (the first one). You need to use that, not `\n`. – Tim Pietzcker Aug 02 '13 at 09:19
  • @thg435: You could do it with another capturing group, but you'd have to iterate over the matches to extract that group because JavaScript doesn't offer direct access to all matches of a single capturing group. – Tim Pietzcker Aug 02 '13 at 09:22
  • Tim, eah now I understood. Your solution works perfectly, and this Q&A really helps me out, and I hope for others, too. Thank you very much again. Same goes to thg435. –  Aug 02 '13 at 09:23
4

All JavaScript environments compliant with ECMAScript 2018 support lookbehind.

Thus, you may use

(?<!\n)\r?\n(?!\r?\n)

to match a single CRLF or LF libne break sequence. If you need to match two line breaks, wrap the \r?\n consuming pattern part within a group and set a quantifier to it: (?<!\n)(?:\r?\n){2}(?!\r?\n) matches a double line break sequece.

Details:

  • (?<!\n) - a negative lookbehind that fails the match if there is an LF char immediately to the left of the current location
  • \r?\n - an optional CR and then an LF char
  • (?!\r?\n) - a negative lookahead that fails the match if there is an optional CR and then an LF char immediately to the right of the current location.

See the JavaScript demo showing how to replace in-paragraph line break sequences, i.e. those single line break sequences:

const text = "This\nis\nparagraph\none\n\nThis is the\nsecond\nparagraph";
console.log( text.replace(/(?<!\n)\r?\n(?!\r?\n)/g, "<br />") );
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
1

To match exactly N repetitions of the same character you need lookaheads and lookbehinds (see Match exactly N repetitions of the same character). Since javascript doesn't support the latter, a pure regexp solution seems to be impossible. You'll have to use a helper function, for example:

> x = "...a...aa...aaa...aaaa...a...aa"
"...a...aa...aaa...aaaa...a...aa"
> x.replace(/a+/g, function($0) {
        return $0.length == 2 ? '@@' : $0;
    })
"...a...@@...aaa...aaaa...a...@@"
Community
  • 1
  • 1
georg
  • 211,518
  • 52
  • 313
  • 390
  • 2
    Well, you can do it without lookbehind, if you're willing to match more than just the newlines, and deal with that extra character later. – Tim Pietzcker Aug 02 '13 at 08:54