0

I have the following HTML:

<p>This is a tag</p>
<div>Another tag</p>
<div><a href="#">anchor</a><div>
<br>
<br>
<math xmlns="http://www.w3.org/1998/Math/MathML">
    <mi>x</mi>
    <mo>=</mo>
</math>
<hr><br>

I want to extract all HTML and MathML into an array and keep their order:

[
   [0] => '<p>This is a tag</p>
    <div>Another tag</p>
    <div><a href="#">anchor</a><div>
    <br>
    <br>',
   [1] => '<math xmlns="http://www.w3.org/1998/Math/MathML">
        <mi>x</mi>
        <mo>=</mo>
    </math>'
   [2] => '<hr><br>'
]

Can Regex do this, because HR or BR tag could don't have closed splash? Or any library?

Any help will be greatly appreciated. Thanks in advance.

trinvh
  • 1,500
  • 2
  • 11
  • 20
  • 1
    Use a HTML parser. Not regex. e.g. http://simplehtmldom.sourceforge.net/ – GeorgeQ Apr 11 '17 at 02:19
  • 1
    Further reading: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags?rq=1 – Scopey Apr 11 '17 at 02:20

1 Answers1

0

use this regex :

"#(.*)(<math.*?</math>)(.*)#s"
Sandeep Kothari
  • 405
  • 3
  • 6