How to using RegExp to convert nested tag

Question

As mentioned, I am trying to convert some html tags to other custom tags with RegExp.

My solution is not worked in nested tag as below:

Solution 1:

var str = '<span style=\"font-size: x-large;\"><span style=\"color: red;\">HELLO WORLD</span></span>';
var txt = str.replace(/<span style=\"(font-size|color): (.*?);\">(.*?)<\/span>/gim,"[$2]$3[/$2]");

Excepted result:

[x-large][red]HELLO WORLD[/red][/x-large]

Actual result:

[x-large]<span style="color: red;">[/x-large]</span>

Solution 2:

var str = '<span style=\"font-size: x-large;\"><span style=\"color: red;\">HELLO WORLD</span></span>';
var txt = str.replace(/<span style=\"(font-size|color): (.*?);\">(.*?)<\/span>/gim,"[$2]$3[/$2]");
txt = txt.replace(/<span style=\"(font-size|color): (.*?);\">(.*?)<\/span>/gim,"[$2]$3[/$2]");

Excepted result:

[x-large][red]HELLO WORLD[/red][/x-large]

Actual result:

[x-large][red]HELLO WORLD[/x-large][/red]

Regexp is not smart enough to handle languages like HTML which involve nesting, sorry. — , Aug 10 '17 at 03:19
[You shouldn't](https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454). — MC Emperor, Aug 10 '17 at 06:51
Possible duplicate of [RegEx match open tags except XHTML self-contained tags](https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags) — pchaigno, Nov 08 '17 at 13:15

score 0 · Answer 1 · answered Aug 10 '17 at 06:45

TL;DR. You cannot parse an arbitrary number of nested HTML tags only with regular expressions. You'd need some memory of the opening tags you parsed in order to properly parse the closing tags.

Note: You might be able to parse a finite number of nested HTML tags with regular expressions, although it will quickly become a mess.

Why can't we parse HTML with regular expressions?

In the Chomsky hierarchy, HTML is a context free language whereas regular expressions correspond to regular languages. Your regular expressions are compiled to finite state automatons, which are just not (computationally) powerful enough to recognize context free languages. To recognize context free languages you need a pushdown automaton.

To parse an arbitrary number of nested HTML tags, you'd need some memory of the opening tags you parsed in order to properly close them. This memory can be structured as a stack (you push for opening tags, and pop for closing tags). This is exactly what a pushdown automaton is: a finite state automaton with a stack.

How to using RegExp to convert nested tag

1 Answers1