I'm trying to retrieve <h1>
opening tags from a html string. I want to include everything from <h1
to >
.
Right now this is how I'm trying to do it, however it seems it's causing problems with encoding as when I print the resulting $html utf-8 characters show incorrectly:
$dom = new DOMDocument();
$dom->loadHTML($html);
//Evaluate Anchor tag in HTML
$xpath = new DOMXPath($dom);
$elements = $xpath->evaluate("/html/body//h1");
for ($i = 0; $i < $elements->length; $i++) {
print_r($elements->item($i));
}
// save html
$html=$dom->saveHTML();
How can I make sure it includes everything up to the >
closure?
`. See what happens if you use regex to parse HTML: http://stackoverflow.com/a/1732454/1529630
– Oriol Sep 18 '14 at 14:20