0

Question

How do you define a regular expression that will match each substring that:

  • ends a line
  • is not preceded by one of a given set of characters

Case

I have a function that removes hardcoded newlines from strings of text, so they will reflow properly. The function works fine, apart from intelligently handling hyphenation.

This is a simplified version of what I have for hyphens.

function (string) { return string.replace(/-\n/g, "") }

It works on things it should work on, no problem. So this...

A hyphen-
ated line.

...becomes...

A hyphenated line.

But it goes too far, and doesn't handle dashes properly, so these examples get garbled:

"""
Mary Rose sat on a pin -
Mary rose.

Mary Rose sat on a pin --
Mary rose.
"""

The function should only consider the -\n pattern a match if it's not preceded by a hyphen or any kind of whitespace character.

Carl Smith
  • 3,025
  • 24
  • 36
  • There's an [answer to a similar question here](http://stackoverflow.com/a/641432/1253428), but the answer lacks any explanation, so I couldn't figure out how to use it. – Carl Smith Nov 08 '14 at 16:13
  • what is the expected output for the second example? – nu11p01n73R Nov 08 '14 at 16:16
  • No match - it should be the same. The rest of the function already replaces the newlines with spaces for any line that doesn't match. – Carl Smith Nov 08 '14 at 16:18

2 Answers2

2

You can use:

var repl = string.replace(/([^\s-])-\n/g, "$1");

RegEx Demo

anubhava
  • 761,203
  • 64
  • 569
  • 643
2

You can change your pattern to this:

function (string) { return string.replace(/\b-\n/g, "") }

With a word boundary \b that is the limit between a word character and an other character.

Casimir et Hippolyte
  • 88,009
  • 5
  • 94
  • 125
  • So this will only match if the hyphen is preceded by a letter or number? Nice. – Carl Smith Nov 09 '14 at 11:34
  • @CarlSmith: Yes, but the underscore too (that is a member of the `\w` character class), however, I assume it will not be a problem. – Casimir et Hippolyte Nov 09 '14 at 14:17
  • Not a problem, thanks. This worked perfectly for me; it was a drop in replacement that fixed the problem. @anubhava's solution is nice too, as a general solution for SO. – Carl Smith Nov 09 '14 at 15:33