0

So i have a load of images in html with the id images. An example of one is like this:

<img id="images" src="video images/the wind rises.jpg" alt="" width="700" height="525" class="the-wind-rises1" />

And I am wanting to collect all the srcs(eg. video images/the wind rises.jpg) I have tried this. But it is not working how come?:

<?php
$html = file_get_contents('http://urlofwebsite.co.uk/xxxx');

function linkExtractor($html){
    $imageArr = array();
    $doc = new DOMDocument();
    @$doc->loadHTML($html);
    $images = $doc->getElementById('images');
    foreach($images as $image) {
        array_push($imageArr, $image->getAttribute('src'));
    }
    return $imageArr;
}

echo json_encode(array("images" => linkExtractor($html)));
?>

It is just returning:

{"images":[]}
maxisme
  • 3,974
  • 9
  • 47
  • 97
  • 2
    First, you can only legally have one element with any given `id`, so this is always going to be nasty. Second, using `@` is rarely if ever a good idea. – lonesomeday May 21 '14 at 15:39
  • 1
    or you can use getbytag `$doc->getElementsByTagName('img');` – ɹɐqʞɐ zoɹǝɟ May 21 '14 at 15:41
  • @lonesomeday Sometimes you need it. For example if you are parsing data, and not want to send a warning to the frontend (user). But you should check alltime, if it was successfull, and if not implement custom error handling (`throw new XmlNotValidException()`) – Christian Gollhardt May 21 '14 at 16:08
  • @ChristianGollhardt Why are you running with errors messages enabled on your production server? – lonesomeday May 21 '14 at 16:10
  • @lonesomeday Thats not, what i have said. But why do you want to continue, if the state of you application is not the state youre expecting? Sure in production use, i either see no warning, but should i see in development stage so many warnings, that i don't see the warnings i am interessted in? If you do some own Exception Handling, it feels good to use `@` – Christian Gollhardt May 21 '14 at 16:14
  • @ChristianGollhardt But error messages represent something that needs fixing in your code. In this instance, the correct approach uses libxml_use_internal_errors – lonesomeday May 21 '14 at 16:17

2 Answers2

4

You are using getElementById and this function is supposed to return one element or null take a look at : http://www.php.net/manual/en/domdocument.getelementbyid.php

I would say try the below:

$image = $doc->getElementById('images');
return $image->getAttribute('src');

if your intention is to collect sources for all images then you will have to use getElementsByTagName : http://www.php.net/manual/en/domdocument.getelementsbytagname.php

function linkExtractor($html){
    $imageArr = array();
    $doc = new DOMDocument();
    @$doc->loadHTML($html);
    $images = $doc->getElementsByTagName('img');
    foreach($images as $image) {
        array_push($imageArr, $image->getAttribute('src'));
    }
    return $imageArr;
}
Mehdi Karamosly
  • 5,388
  • 2
  • 32
  • 50
  • my intention is to collect sources for all images with id `images` – maxisme May 21 '14 at 15:44
  • 1
    in html `id` attribute should be unique, you can use `class` in your html instead. please read the id section of http://www.w3.org/TR/html401/struct/global.html#h-7.5.2 – Mehdi Karamosly May 21 '14 at 15:46
  • I tried this where i changed the id to the class but I am getting an error `Call to undefined method DOMDocument::getElementByClass()` – maxisme May 21 '14 at 15:59
  • I don't see any function called `getElementsByClass` in `DOMDocument` API page, I would say you have two options : either get elements by TagName loop through all of them and just pick the ones that have className = 'images' or the other options is to use selectors like in this post: http://stackoverflow.com/questions/6366351/getting-dom-elements-by-class-name – Mehdi Karamosly May 21 '14 at 16:04
  • how would you get elements by TagName loop through all of them and just pick the ones that have className = 'images' ?? – maxisme May 21 '14 at 16:13
  • http://www.php.net/manual/en/domelement.getattribute.php#79074 this example is helpful take a look how he is looping through elements and getting attributes, in your case you will use `getAttribute('class')` and would need to do some debugging using `var_dump` and `echo` to see what you are getting and what further processing you need to do to check the class. – Mehdi Karamosly May 21 '14 at 16:19
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/54138/discussion-between-mehdi-karamosly-and-maximilian). – Mehdi Karamosly May 21 '14 at 16:35
0

because ID is (should be) unique, it returns only one element

$images = $doc->getElementById('images');
array_push($imageArr, $images->getAttribute('src'));

Docs: http://www.php.net/manual/en/domdocument.getelementbyid.php

Peter
  • 16,453
  • 8
  • 51
  • 77