1

Im trying to retrieve the youtube link of a certain site. But when using the simple html DOM parser it cant find the links im looking for.

$new_html = file_get_html("https://www.bia2.com/video/Amir-Shamloo/Delam-Tange/");
    foreach ($new_html->find('href') as $youtube) {
    echo $youtube;
}

it should find the link: https://www.youtube.com/watch?v=vJ2aNG0aJPU.

does someone know what the problem is here?

Gordon
  • 312,688
  • 75
  • 539
  • 559

1 Answers1

1

That particular link is inserted via JavaScript via onYouTubeIframeAPIReady("vJ2aNG0aJPU") during the onload event.

SimpleHtmlDom (or any other PHP based HTML parser for that matter) will not execute any JavaScript. They just parse the markup returned by the webserver.

You'd need a scraper capable of executing Javascript before you can scrape it. Or you can match the argument to that function and assemble the link yourself.

On a side note: $new_html->find('href') will try to find any elements named "href", which is obviously wrong. To get all href attributes for any element, you'd have to use *[href] instead.

On another side not: SimpleHtmlDom is a crap library. Consider your options:

Community
  • 1
  • 1
Gordon
  • 312,688
  • 75
  • 539
  • 559
  • Ty it seems that this is indeed the problem. – wouter.lilopel Feb 15 '16 at 11:36
  • Im looking around for something what can help me but cant really find it. do you know something what can execute js before my simple dom parser retrieves the hrefs – wouter.lilopel Feb 15 '16 at 12:12
  • @wouter.lilopel not in PHP. You will likely need to look into a nodejs solution instead (http://phantomjs.org). Or you just match the ID as suggested. – Gordon Feb 15 '16 at 12:15
  • what do you mean with match the ID can you give a example of that? – wouter.lilopel Feb 15 '16 at 12:18
  • the preg_match_all is still a bit confusing im trying to do the same for https://www.bia2.com/video/ but it takes way to much im trying to get the links of the diverant video pages i used this '#a href="/video/(.*)"#' but i get some extras i dont want – wouter.lilopel Feb 16 '16 at 12:19