0

I'm trying to figure out some JavaScript regex that will match the last space that is not inside an HTML tag. For example, in the following example:

// Should match the space between `custom` and `text`
My custom text;

// Should match the space between `a` and `link`
My custom text with <a href="#">a link<a/>.

// Should still match the space between `a` and `link`
My custom text with <a href="#">a link<a/><span style="color: red;">.</span>

I have the following regular expression (source, modified) that selects all spaces not in HTML tags: (?<!<[^>]*)\s(?<![^>]*<), but I'm not sure how to take it the last little bit further and select only the last of those spaces.

At first I thought I could do this: (?<!<[^>]*)\s(?<![^>]*<)(?=[^\s]*$), but that doesn't work with my last example.

Here's a fiddle.

Any ideas?

Pete
  • 7,289
  • 10
  • 39
  • 63
  • In case you were hoping for this to be reliable: you can’t use regex to determine whether a space is in an HTML tag. `(?<!<[^>]*)\s(?<![^>]*<)` has a lot of edge cases. If you want something reliable, use an HTML parser. If not, and you’d like to carry on with this regex: run it in a loop with `exec`, storing the previous match in a variable, and use the stored value when `exec` returns `null`. That’s the last match. (Also… JavaScript regex? You’re okay with the browser support of lookbehinds?) – Ry- Feb 23 '18 at 02:24
  • Why do you need it? – Kosh Feb 23 '18 at 02:33
  • @Ryan Thanks for the info. I guess I didn't realize that this was a tricky thing for regex. Maybe I'll consider another approach. (But hey, "you should consider a different approach altogether" is as useful an answer as any!) – Pete Feb 23 '18 at 18:04
  • @KoshVery It's slightly ghetto, but basically my client really wants to avoid typographical widows. The typical approach is to add an ` ` between the last two words. I'd like to do that without breaking tags. (As a side note, I'm doing this on the admin side, prior to saving, so that I can avoid the computation and flash-before-nbsp-is-inserted that would appear if I just did it on pages when they loaded). – Pete Feb 23 '18 at 18:13
  • You better go the DOM way, get the last text node within whatever element(s) you need to apply this to, and replace the last space in that with a non-breaking one. In case that replacement operation returns the same text content as before (so there was no space in this text node), move on to the second-last text node, etc. https://stackoverflow.com/a/7078792/1427878 shows a way to get all text nodes using XPath and with PHP DOM, https://stackoverflow.com/a/2579869/1427878 has several ways to do the same in JS. – CBroe Feb 23 '18 at 19:25

1 Answers1

1

You need \s+((\S|<[^>]+>)*)$ which looks for 1 or more spaces followed by 0 or more non-spaces or html tags.

Look at the snippet below:

var txt1 = 'My custom text.',
    txt2 = 'My custom text with <a href="#">a link<a/>',
    txt3 = 'My custom text with <a href="#">a link<a/><span style="color: red;">.</span>';

var reg = new RegExp(/\s+((\S|<[^>]+>)*)$/, 'g');

console.log(txt1.replace(reg, "&nbsp;$1"));
console.log(txt2.replace(reg, "&nbsp;$1"));
console.log(txt3.replace(reg, "&nbsp;$1"));
Kosh
  • 16,966
  • 2
  • 19
  • 34