JS regular expression works at regular-expressions.info, but not in my code

Question

I have this string in JavaScript:

s = "</p><ol><li>First\n</li><li>Second\n</li></ol><p>"

Then I do this (to remove the outer "</p>...<p>"):

s = s.replace(/^<\/([^> ]+)[^>]*>(.*)<\1>$/,"$2");

Nothing happens (s is unchanged, and using match() returns false), but if I go try it at http://www.regular-expressions.info/javascriptexample.html, it works!

I've tried all sorts of things (creating a separate regExp object, using //g, taking out the ^$, replacing [^> ]+ with [a-z0-9]*...) but nothing makes any difference.

It's driving me nuts. can anyone tell me what I'm doing wrong?

What you're doing wrong is using Regular Expressions to parse HTML. HTML is not a Regular Language. See http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags More than that, it's invalid HTML — Benjamin Gruenbaum, Mar 14 '13 at 11:50
@BenjaminGruenbaum: While that's a fun answer, it's completely incorrect in this case. OP is just trying to remove a starting tag, and matching ending tag, for this a regex will do perfectly fine. — Evert, Mar 14 '13 at 11:52
It's not about being fun, it's about being completely wrong in the approach. Using a RegExp to parse HTML is wrong, HTML is not a regular language, it's _very_ easy to show that it's impossible to build a DFA to parse it. Even if it's possible to build such a DFA (or Regex) for this specific case, i t's the wrong tool to approach this problem. Just because it's possible doesn't mean one should do it. — Benjamin Gruenbaum, Mar 14 '13 at 11:54
Note that the problem here is that the string starts with a closing tag, and ends with an opening tag. That's why I'm using an RE: it isn't well-formed HTML. — user1636349, Mar 14 '13 at 12:33
Problem now solved as per MikeM below. And meanwhile if anyone took BenjaminGruenbaum's advice they'd still be crafting an LL(k) sledgehammer for a very very small RE nut. — user1636349, Mar 17 '13 at 16:05

score 1 · Accepted Answer · answered Mar 14 '13 at 12:05

1

The problem is simply that . does not match newlines \n.

If you replace the .* with [\s\S]*, your regex should work.

[\s\S] means match any space or non-space character, which equates to match any character.

answered Mar 14 '13 at 12:05

MikeM

13,156
2
34
47

Magic. Thank you very much. (I tried //s earlier and it didn't work; doesn't JS support this?) – user1636349 Mar 14 '13 at 12:38
@user1636349. No, JS doesn't have a single-line mode. – MikeM Mar 14 '13 at 14:41
Thanks, hadn't realised. Spot the newbie. – user1636349 Mar 17 '13 at 16:01

JS regular expression works at regular-expressions.info, but not in my code

1 Answers1