I am trying to scrape a bit of html and the structure is coming out like this.
//blockquote
<h2>1. text text</h2>
<p>1. paragraph paragraph</p>
<h2>2. text text</h2>
<p>2. paragraph paragraph</p>
<h2>3. text text</h2>
<p>3a. paragraph paragraph</p>
<p>3b. paragraph paragraph</p>
<h2>4. text text</h2>
<p>4. paragraph paragraph</p>
-- so initially it was hooking into the paragraph tags - but I noticed that some blocks have more than one paragraph. At this point I am unsure how to adjust the explode function I had in place.
$paras = explode("<p>", $paras);
So the final array I need to look something more like this.
array(
"<p>1. paragraph paragraph</p>",
"<p>2. paragraph paragraph</p>",
"<p>3a. paragraph paragraph</p><p>3b. paragraph paragraph</p>",
"<p>4. paragraph paragraph</p>"
):
this is how the code currently looks
foreach($lookuphtml->find('blockquote') as $text) {
$paras = $text->innertext;
$paras = explode("<p>", $paras);
}
//actual contents looks like this
<blockquote><h2 class="left">History</h2><p>Opened October 1997 as the first brewery in Bath since 1956. The brewery is located in an outbuilding behind Ye Old Farmhouse public house.</p><h2 class="left">Beers Brewed</h2><p>We do not maintain a list of beers brewed by each brewery. There may be a list on the brewery's own website and we suggest you also visit the entry for Abbey Ales Ltd on the independent <a href="http://www.beermad.org.uk/brewery/2" rel="external" target="_blank">www.beermad.org.uk</a>.</p><h2 class="left">Regular Outlets</h2><p>The brewery has 4 pubs :</p><p>The Star, 23 Vineyards, Bath, BA1 5NA <br>The Coeur de Lion, Northumberland Place, Bath, BA1 5AR<br>The Foresters, 58 Goose Street, Beckington, Frome, BA11 6SS<br>The Assembly, 16-17 Alfred Street, Bath, BA1 2QU</p><h2 class="left">Visit Information</h2><p>Information on visit availability can be found on the breweries web site.</p><h2 class="left">Brewery Shop Information</h2><p>The brewery does not have a shop, but sells a variety of items via it's web site.</p></blockquote>
...Answer
never mind guys - here is the solution.
foreach($lookuphtml->find('blockquote') as $text) {
$paras = $text->innertext;
$paras = preg_replace("/<h2 class=\"left\">(.*?)<\/h2>/", "#~", $paras);
$pa = explode("#~", $paras);
$pa2 = array_splice($pa, 1);
}
3. paragraph paragraph
3. paragraph paragraph
", – The Old County Aug 22 '16 at 09:34