Example string (html content):
some content
<h2>title 1</h2>
<p>more content</p>
<h2>title 2</h2>
rest of the content
I need to split this into associative array by the <h2></h2>
, yet keep all contents of the string.
Desired outputs:
array(){
'text1' => 'some content',
'title1' => 'title 1',
'text2' => '<p>more content</p>',
'title2' => 'title 2',
'text3' => 'rest of the content'
}
or
array(){
[0] => {
'text' => 'some content',
'title' => 'title 1'
},
[1] => {
'text' => '<p>more content</p>',
'title' => 'title 2'
},
[2] => {
'text' => 'rest of the content'
}
}
What I tried
preg_split()
with PREG_SPLIT_DELIM_CAPTURE
almost does the job, but it outputs indexed array.
I tried using regex, but it fails capturing text3:
(.*?)(<h2.*?<\/h2>)
Any help or idea is very appreciated.
(.*?)
|\s*(.+?)\s*(?=.*?
|$))` forget that duplicate junk. Parsing html with a DOM will fail if the html is junked up. Use something that works. Or, you could try to find a DOM parser that can go past malformed html (and you can't). – Feb 22 '16 at 15:56