57

I would like to get the SRC attribute into a variable in this example:

<img border="0" src="/images/image.jpg" alt="Image" width="100" height="100" />

So for example - I would like to get a variable $foo = "/images/image.jpg". Important! The src attribute will be dynamic, so it mustn't be hardcoded. Is there any quick and easy way to do this?

Thanks!

EDIT: The image will be a part of a huge string that is basically the content of a news story. So the image is just a part of that.

EDIT2: There will be more images in this string, and I would only want to get the src of the first one. Is this possible?

pangi
  • 2,703
  • 6
  • 25
  • 35

7 Answers7

120

Use a HTML parser like DOMDocument and then evaluate the value you're looking for with DOMXpath:

$html = '<img id="12" border="0" src="/images/image.jpg"
         alt="Image" width="100" height="100" />';

$doc = new DOMDocument();
$doc->loadHTML($html);
$xpath = new DOMXPath($doc);
$src = $xpath->evaluate("string(//img/@src)"); # "/images/image.jpg"

Or for those who really need to save space:

$xpath = new DOMXPath(@DOMDocument::loadHTML($html));
$src = $xpath->evaluate("string(//img/@src)");

And for the one-liners out there:

$src = (string) reset(simplexml_import_dom(DOMDocument::loadHTML($html))->xpath("//img/@src"));
hakre
  • 193,403
  • 52
  • 435
  • 836
  • This seems to get one image. Anyway to get all the images in HTML? – jim smith May 30 '15 at 10:52
  • 1
    @jimsmith: Remove the string cast and the reset call and you have an array of all SRC attributes (as SimpleXMLElements). – hakre May 31 '15 at 11:23
  • how can use $xpath->evaluate("string(//img/@src)"); # "/images/image.jpg" inside a for loop?? – dtanwar Mar 23 '17 at 08:08
  • 1
    @dtanwar: By not using a single string() evaluation but by obtaining all the @src attribute nodes via a query: `$xpath->query('//img/@src')`. This retruns a query result you can loop over, see https://php.net/domxpath.query for an example and more detailed documentation. – hakre Mar 26 '17 at 13:14
22

You would be better off using a DOM parser for this kind of HTML parsing. Consider this code:

$html = '<img id="12" border="0" src="/images/image.jpg"
         alt="Image" width="100" height="100" />';
$doc = new DOMDocument();
libxml_use_internal_errors(true);
$doc->loadHTML($html); // loads your html
$xpath = new DOMXPath($doc);
$nodelist = $xpath->query("//img"); // find your image
$node = $nodelist->item(0); // gets the 1st image
$value = $node->attributes->getNamedItem('src')->nodeValue;
echo "src=$value\n"; // prints src of image

OUTPUT:

src=/images/image.jpg
anubhava
  • 761,203
  • 64
  • 569
  • 643
  • 1
    For more extensive HTML parsing, I completely agree, but for this it's simply overkill. Your code is longer, slower, and harder to read. – kba Apr 12 '12 at 20:10
  • @KristianAntonsen: How can you say this code is `slower` than regex? Do you have any benchmarking to support this behavior? – anubhava Apr 12 '12 at 20:15
  • 1
    @anubhava I would say that it's both obvious and common sense. You're loading a heavy library and initializing objects. But since you asked, I made a small benchmark comparing our codes. 100,000 executions takes about **0,49 seconds** with my code. It takes **6,2 seconds** with your code. – kba Apr 12 '12 at 20:39
  • 2
    @KristianAntonsen: That benchmark is cheating, because pcre caches compiled regexes per request. That means, it executes once really and 99,9999 times it fetches the precompiled result. You need to compare 100,000 requests against each other, not only function calls to come closer to reality. Microbenchmarking often can mislead with regexes. – hakre Apr 12 '12 at 22:01
  • 1
    With a single execution, it's still more than twice as fast. Either way, I don't see a reason to discuss this. If you find your code easier to read (or whatever quality parameter you use), stick to it, and I'll stick to mine. – kba Apr 12 '12 at 22:19
  • Will this work if there are more images? So if I have 2 images, and I only want the src of the first one. – pangi Apr 15 '12 at 18:15
  • @JernejPangeršič: Yes it will work for that case also since I'm using `$node = $nodelist->item(0);` which is getting very first image. – anubhava Apr 16 '12 at 05:51
18

I have done that the more simple way, not as clean as it should be but it was a quick hack

$htmlContent = file_get_contents('pageURL');

// read all image tags into an array
preg_match_all('/<img[^>]+>/i',$htmlContent, $imgTags); 

for ($i = 0; $i < count($imgTags[0]); $i++) {
  // get the source string
  preg_match('/src="([^"]+)/i',$imgTags[0][$i], $imgage);

  // remove opening 'src=' tag, can`t get the regex right
  $origImageSrc[] = str_ireplace( 'src="', '',  $imgage[0]);
}
// will output all your img src's within the html string
print_r($origImageSrc);
Torsten
  • 189
  • 1
  • 2
13

I know people say you shouldn't use regular expressions to parse HTML, but in this case I find it perfectly fine.

$string = '<img border="0" src="/images/image.jpg" alt="Image" width="100" height="100" />';
preg_match('/<img(.*)src(.*)=(.*)"(.*)"/U', $string, $result);
$foo = array_pop($result);
kba
  • 19,333
  • 5
  • 62
  • 89
  • The problem is that this regex is specific to this variable. What if you wanted to get the `src` from another image? – gen_Eric Apr 12 '12 at 20:02
  • @Rocket The regex above is not specific to that variable. This will work with all (I believe) `img` tags that has a `src` attribute. – kba Apr 12 '12 at 20:04
  • 2
    it will fail if there's a space before or after the equal `` – Adri V. Apr 12 '12 at 20:07
  • 2
    @AdrianaVillafañe: Isn't that not valid HTML anyway? – gen_Eric Apr 12 '12 at 20:07
  • @AdrianaVillafañe Now it will match that as well. – kba Apr 12 '12 at 20:18
  • That's the point. Not every website "in the wild" has perfectly valid HTML. This code renders, and browsers show the image, and for many people that's all that matters (even if it's not valid) : `` – Adri V. Apr 12 '12 at 20:21
  • @AdrianaVillafañe As I said, I've updated the answer. It will now match. – kba Apr 12 '12 at 20:29
  • 1
    I deleted my previous comment. But I now add two more: Case 1: `` (src without quotes) | Case 2 : `` (src surrounded with single quotes) – Adri V. Apr 12 '12 at 20:34
  • Case 3 : `` – Adri V. Apr 12 '12 at 20:43
  • 1
    All OP was asking was something to match his example using `"`s and no spaces. I know there is a reason why the DOM class is so much slower than a simple regex - one of these being it takes all these edge-cases into consideration, but it doesn't change the fact that sometimes the biggest tool isn't the best. – kba Apr 12 '12 at 20:43
7
$imgTag = <<< LOB
<img border="0" src="/images/image.jpg" alt="Image" width="100" height="100" />
<img border="0" src="/images/not_match_image.jpg" alt="Image" width="100" height="100" />
LOB;

preg_match('%<img.*?src=["\'](.*?)["\'].*?/>%i', $imgTag, $matches);
$imgSrc = $matches[1];

DEMO


NOTE: You should use an HTML Parser like DOMDocument and NOT a regex.

Community
  • 1
  • 1
Pedro Lobito
  • 94,083
  • 31
  • 258
  • 268
4
$str = '<img border="0" src=\'/images/image.jpg\' alt="Image" width="100" height="100"/>';

preg_match('/(src=["\'](.*?)["\'])/', $str, $match);  //find src="X" or src='X'
$split = preg_split('/["\']/', $match[0]); // split by quotes

$src = $split[1]; // X between quotes

echo $src;

Other regexp's can be used to determine if the pulled src tag is a picture like so:

if(preg_match('/([jpg]{3}$)|([gif]{3}$)|([jpeg]{3}$)|([bmp]{3}$)|([png]{3}$)/', $src) == 1) {
//its an image
}
squarephoenix
  • 1,003
  • 7
  • 7
-1

There could be two easy solutions:

  1. HTML it self is an xml so you can use any XML parsing method if u load the tag as XML and get its attribute tottally dynamically even dom data attribute (like data-time or anything).....
  2. Use any html parser for php like http://mbe.ro/2009/06/21/php-html-to-array-working-one/ or php parse html to array Google this
HamZa
  • 14,671
  • 11
  • 54
  • 75
Jitendra
  • 1,107
  • 2
  • 12
  • 22