3

I am trying to trim leading and trailing whitespace and newlines from a string. The newlines are written as \n (two separate characters, slash and n). In other words, it is a string literal, not a CR LF special character.

For example, this:

\n \nRight after this is a perfectly valid newline:\nAnd here is the second line. \n

Should become this:

Right after this is a perfectly valid newline:\nAnd here is the second line.

I came up with this solution:

text = text
        .replace(/^(\s*(\\n)*)*/, '') // Beginning
        .replace(/(\s*(\\n)*)*$/, '') // End

These patterns match just fine according to RegexPal.

However, the second pattern (matching the end of the string) takes a very long time — about 32 seconds in Chrome on a string with only a couple of paragraphs and a few trailing spaces. The first pattern is quite fast (milliseconds) on the same string.

Here is a CodePen to demonstrate it.

Why is it so slow? Is there a better way to go about this?

craigpatik
  • 941
  • 13
  • 25
  • Possible duplicate [trim in javascript ? what this code is doing?](http://stackoverflow.com/q/3387088/580951) – Dustin Kingen Jul 31 '13 at 19:08
  • @Romoku I agree that the other question is on the same topic, but I don't see anything there about regex performance. – Pointy Jul 31 '13 at 19:09
  • 4
    You have a huge deal of optionality, even nested optionality. That is very slow. Do `.replace(/^\s+/).replace(/\s+$/)` – Esailija Jul 31 '13 at 19:09
  • 4
    Look up [catastrophic backtracking](http://www.regular-expressions.info/catastrophic.html). This is almost certainly the problem. – Justin Morgan - On strike Jul 31 '13 at 19:09
  • You don't need to write newline as `\\n` in a regex literal. If you do, then you're telling it to match a backslash character followed by the letter "n". Also, `\n` is matched by the special `\s` character class. – Pointy Jul 31 '13 at 19:11
  • \s stands for "whitespace character". It includes [ \t\r\n]. That is: \s will match a space, a tab or a line break. You don't need to add the matches for "\n". Look at Esailija's comment. – Matthew Wesly Jul 31 '13 at 19:13
  • 3
    He wants to match `\n` as a string literal, not a CR LF. That's what he spent the first three paragrahps explaining. This isn't catastrophic backtracking. – Dan Jul 31 '13 at 19:14
  • @Pointy That's exactly what I'm after -- a backslash followed by an 'n'. It is not a single (invisible) newline character. The pattern I posted does work, it's just slow. – craigpatik Jul 31 '13 at 19:14
  • Craig - just for the hell of it, try making your second regex `/(\s|\\n)+$/` and see if anything changes. – Dan Jul 31 '13 at 19:16
  • @Dan Yes it is, the actual characters you want to match is irrelevant, \s matches a great deal of different characters by itself already. – Esailija Jul 31 '13 at 19:27
  • @craigpatik ah OK I understand. Sorry for misunderstanding that. – Pointy Jul 31 '13 at 19:27

2 Answers2

7

The reason it takes so long is because you have a * quantifying two more *

A good explanation can be found in the PHP manual, but I don't think JavaScript supports once-only subpatterns.

I would suggest this regex instead:

text = text.replace(/^(?:\s|\\n)+|(?:\s|\\n)+$/g,"");
Niet the Dark Absol
  • 320,036
  • 81
  • 464
  • 592
  • That did it. Very quick, thank you. See the "Both, much faster" button in the pen: http://codepen.io/cpatik/pen/dkmel – craigpatik Jul 31 '13 at 19:25
-3

Not a good answer but one workaround would be to reverse the string and also reverse \n to n\ in the regular expression (for beginning), apply it, then reverse the string back.

Raivo Laanemets
  • 1,119
  • 10
  • 7