Javascript regexp replace all
's

Question

I'm trying to replace any   tags that appear AFTER a </h2> tag. This is what I have so far:

Text = Text.replace(new RegExp("</h2>(\<br \/\>.+)(.+?)", "g"), '</h2>$2');

It doesn't seem to work, can anyone help? (No matches are being found).

Test case:

<h2>Testing</h2><br /><br /><br />Text

To:

<h2>Testing</h2>Text

It's like you're begging me to post a link to this question: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 — Gabe Moothart, Apr 28 '11 at 22:41
@Gabe, I don't see how, this is for a WYSIWYG editor I'm writing, it turns `\n` into `
` and `##title##` into `
Title
` but now I just want to remove all trailing `
` after the `h2` or it looks bad. — Tom Gullen, Apr 28 '11 at 22:43
Use a parser library if available. You would even be better off just writing a quick and simple character-by-character parser. It would actually be less work, more satisfying, easier to understand and less error-prone than regex. And you can add more features easily when you need to. My rule of thumb is regular _expression_: it's only one or two levels up from tokens. You could use regex to validate a single HTML element or a text node. I would consider that expression-level. But not structured HTML. No doubt someone will come up with a very clever regex which solves your problem. — rohannes, Apr 28 '11 at 22:51
@Rohannes, I think regexp is better, because once the form is submitted the data has to be processed server side to produce the same output, so maintaing regexp is easier this way. — Tom Gullen, Apr 28 '11 at 23:32

score 16 · Accepted Answer · answered Apr 28 '11 at 22:42

16

This is simpler than you're thinking it out to be:

Text = Text.replace(new RegExp("</h2>(\<br \/\>)*", "g"), "</h2>");

answered Apr 28 '11 at 22:42

mVChr

49,587
11
107
104

You shouldn't use an uppercase first character for an instance. It's commonly reserved for class names. – Samuel Dauzon Sep 17 '15 at 14:00

score 5 · Answer 2 · answered Apr 28 '11 at 22:42

5

This would do what you are asking:

Text = Text.replace(new RegExp("</h2>(<br />)*", "g"), '</h2>');

answered Apr 28 '11 at 22:42

serby

4,186
2
24
25

score 5 · Answer 3 · answered Apr 28 '11 at 23:20

5

If you have jQuery kicking around then you can do this safely without regular expressions:

var $dirty = $('<div>').append('<p>Where is<br>pancakes</p><h2>house?</h2><br><br>');
$dirty.find('h2 ~ br').remove();
var clean = $dirty.html();
// clean is now "<p>Where is<br>pancakes</p><h2>house?</h2>"

This will also insulate against the differences between  ,  ,  ,  , etc.

answered Apr 28 '11 at 23:20

mu is too short

426,620
70
833
800

Thanks, I think going regexp is better because I have to duplicate all these rules serverside in c# when the form is actually submitted. – Tom Gullen Apr 28 '11 at 23:31
@Tom: I'd recommend that you use an HTML parser (with both element and attribute whitelisting) on the server side too, you should fully scrub everything that comes from the client even if you're doing client side scrubbing and even if you fully trust your users. OTOH, this is your project, not mine :) – mu is too short Apr 29 '11 at 04:49

score 3 · Answer 4 · answered Apr 28 '11 at 22:48

3

You can also make this a little nicer? using the shorthand regex syntax

Text = Text.replace(/<\/h2>(<br\s*\/>)*/g, '</h2>');

answered Apr 28 '11 at 22:48

serby

4,186
2
24
25

2

I'd change the `*` to a `+`. Otherwise, you are unnecessarily replacing `` with `` when there are zero `
` tags. – ridgerunner Apr 28 '11 at 23:23

Javascript regexp replace all 's

Title

4 Answers4

Javascript regexp replace all
's