-1

I'm using PHP to scrape a few websites. The image information is contained within a script.

<body>
  <div>something</div>
  <div>Something else</div>
  <script type="text/javascript" language="javascript">
      var imgs = ['<img alt="image1" class="happy-image" src="http://example.com/image1.jpg" title = "Image 1">, <img alt="image2" class="happy-image" src="http://example.com/image2.jpg" title = "Image 2">];

  </script>
</body>

I would like to extract from the this string using PHP the information associated with this image and wouldn't even know where to begin to write the regex to make this happen.

Adam S
  • 509
  • 10
  • 24
  • too bad you can't use javascript. you could create an element from the string and then extract the `src` attribute – gillyspy May 17 '13 at 01:11
  • oh look this is a repeat. http://stackoverflow.com/questions/138313 – gillyspy May 17 '13 at 01:22
  • possible duplicate of [How to extract img src, title and alt from html using php?](http://stackoverflow.com/questions/138313/how-to-extract-img-src-title-and-alt-from-html-using-php) – halfer May 18 '13 at 11:01

1 Answers1

2

Your safest bet would be to parse the HTML with DOMDocument, extract the script's contents, then parse that as HTML. This will give you access to the images. Like so:

$dom = new DOMDocument();
$dom->loadHTML($your_html_here);
$script = $dom->getElementsByTagName('script')->item(0);
$dom->loadHTML($script->nodeValue);
$imgs = $dom->getElementsByTagName('img');
foreach($imgs as $img) {
    $src = $img->getAttribute("src");
    // do something
}
Niet the Dark Absol
  • 320,036
  • 81
  • 464
  • 592