I need to read some content from a html page.
I've tested simple_html_dom
, but it simply isn't usable for what I need it for.
I need something like this (pseaudo syntax based on simple_html_dom):
$html = file_get_contents($url);
$html_obj = parse_html($html);
$title = $html_obj->get('title');
$meta1 = $html_obj->get('meta[name=description]', 'innertext']; //text only
$meta2 = $html_obj->get('meta[name=keywords]', 'innertext']; // text only
$content = $html_obj->get('div[id=section_a]', outerText); //html code
I've tested simple_html_dom in so many ways, and only managed to get parts of what I need. It simply isn't "simple".
I've also tested PHP DOMDocument::loadHTML
, but it I run in to problems dealing with inline <script>
.
Are there any php librarys that makes it as easy to get content as in jQuery
?
Update
One of my problems is a a piece of 3rd party javascript from an add agency:
<script language="javascript" type="text/javascript">
<!--
if (window.adgroupid == undefined) {
window.adgroupid = Math.round(Math.random()*100000);
}
document.write('<scr'+'ipt language="javascript1.1" type="text/javascript" src="http://adserver.adtech.de/addyn|3.0|994|3159100|0|-1|size=980x150|ADTECH;loc=100;target=_blank;key=startside,kvinner, kvinnesak, bryllup, graviditet, mamma, kosmetikk, markedsplass, dagbok, feminisme;grp='+window.adgroupid+';misc='+new Date().getTime()+'"></scri'+'pt>');
//-->
</script>
Even if I change <scr'+'ipt
to <script
it gives me invalid javascript code.