Regex & PHP - isolate src attribute from img tag

Question

With PHP, how can I isolate the contents of the src attribute from $foo? The end result I'm looking for would give me just "http://example.com/img/image.jpg"

$foo = '<img class="foo bar test" title="test image" src="http://example.com/img/image.jpg" alt="test image" width="100" height="100" />';

@meagar - Using regex is valid (although not necessarily the most efficient route) in this limited scope. — John Parker, Jan 22 '10 at 21:51
I misspoke with the original post title and shouldn't have added regex. I really like karim79's solution, but it requires adding a non-standard class. — Jeff, Jan 22 '10 at 22:49
Does this answer your question? [How do you parse and process HTML/XML in PHP?](https://stackoverflow.com/questions/3577641/how-do-you-parse-and-process-html-xml-in-php) — TylerH, Feb 08 '23 at 04:01

score 76 · Accepted Answer · answered Jan 22 '10 at 22:14

If you don't wish to use regex (or any non-standard PHP components), a reasonable solution using the built-in DOMDocument class would be as follows:

<?php
    $doc = new DOMDocument();
    $doc->loadHTML('<img src="http://example.com/img/image.jpg" ... />');
    $imageTags = $doc->getElementsByTagName('img');

    foreach($imageTags as $tag) {
        echo $tag->getAttribute('src');
    }
?>

score 40 · Answer 2 · edited Jun 20 '20 at 09:12

40

Code

<?php
    $foo = '<img class="foo bar test" title="test image" src="http://example.com/img/image.jpg" alt="test image" width="100" height="100" />';
    $array = array();
    preg_match( '/src="([^"]*)"/i', $foo, $array ) ;
    print_r( $array[1] ) ;

Output

http://example.com/img/image.jpg

edited Jun 20 '20 at 09:12

Community

1
1

answered Jan 22 '10 at 21:54

St.Woland

5,357
30
30

Look out for `&` entity references and numeric character references in the results! – bobince Jan 22 '10 at 22:05
1

As you wish! =) Here is a alternative syntax: `/src="(.*?)"/i`. – Alix Axel Jan 22 '10 at 22:38
HTML permits use of single quotes, as long as they match. And the "alternative syntax" can match a lot more chars than expected. Finally, the `img` attribute can have spaces in the beginning and end. – XedinUnknown Jun 23 '16 at 08:08
it should be: `/[sS][rR][cC]\s*=\s*['"]([^'"]+)['"]/i` – jewelnguyen8 Nov 27 '18 at 10:30
@jewel why make case insensitive character classes and also write the case insensitive pattern modifier at the end? That makes no sense and makes the pattern smell. – mickmackusa Feb 07 '23 at 08:00

score 9 · Answer 3 · answered Jan 23 '10 at 01:44

9

I got this code:

$dom = new DOMDocument();
$dom->loadHTML($img);
echo $dom->getElementsByTagName('img')->item(0)->getAttribute('src');

Assuming there is only one img :P

answered Jan 23 '10 at 01:44

AntonioCS

8,335
18
63
92

score 7 · Answer 4 · answered Jan 22 '10 at 22:11

// Create DOM from string
$html = str_get_html('<img class="foo bar test" title="test image" src="http://example.com/img/image.jpg" alt="test image" width="100" height="100" />');

// echo the src attribute
echo $html->find('img', 0)->src;

http://simplehtmldom.sourceforge.net/

score 4 · Answer 5 · answered Jul 10 '15 at 22:01

I'm extremely late to this, but I have a simple solution not yet mentioned. Load it with simplexml_load_string (if you have simplexml enabled) and then flip it through json_encode and json_decode.

$foo = '<img class="foo bar test" title="test image" src="http://example.com/img/image.jpg" alt="test image" width="100" height="100" />';

$parsedFoo = json_decode(json_encode(simplexml_load_string($foo)), true);
var_dump($parsedFoo['@attributes']['src']); // output: "http://example.com/img/image.jpg"

$parsedFoo comes through as

array(1) {
  ["@attributes"]=>
  array(6) {
    ["class"]=>
    string(12) "foo bar test"
    ["title"]=>
    string(10) "test image"
    ["src"]=>
    string(32) "http://example.com/img/image.jpg"
    ["alt"]=>
    string(10) "test image"
    ["width"]=>
    string(3) "100"
    ["height"]=>
    string(3) "100"
  }
}

I've been using this for parsing XML and HTML for a few months now and it works pretty well. I've had no hiccups yet, though I haven't had to parse a large file with it (I imagine using json_encode and json_decode like that will get slower the larger the input gets). It's convoluted, but it's by far the easiest way to read HTML properties.

I did find a small issue with this last week. If an XML node has both attributes and a value, only the value is accessible with this method. I ended up having to write a simple parser that will transform simplexml into an array while keeping all data. — Josh, Jul 27 '15 at 21:42

score 2 · Answer 6 · answered Jan 22 '10 at 22:51

2

Here's what I ended up doing, although I'm not sure about how efficient this is:

$imgsplit = explode('"',$data);
foreach ($imgsplit as $item) {
    if (strpos($item, 'http') !== FALSE) {
        $image = $item;
        break;
    }
}

answered Jan 22 '10 at 22:51

Jeff

2,794
8
30
35

1

this approach will run into problems if the image's URL is relative to the document, e.g. "../../img/something.jpg" – tomfumb May 03 '12 at 22:47

score 1 · Answer 7 · answered Jan 22 '10 at 23:37

You can go around this problem using this function:


function getTextBetween($start, $end, $text)
{
 $start_from = strpos($text, $start);
 $start_pos = $start_from + strlen($start);
 $end_pos = strpos($text, $end, $start_pos + 1);
 $subtext = substr($text, $start_pos, $end_pos);
 return $subtext;
}

$foo = '<img class="foo bar test" title="test image" 
src="http://example.com/img/image.jpg" alt="test image"
width="100" height="100" />';

$img_src = getTextBetween('src="', '"', $foo);

Harsh Patel · Answer 8 · 2023-02-02T12:56:03.110

<?php
    $html = '
        <img border="0" src="/images/image1.jpg" alt="Image" width="100" height="100" />
        <img border="0" src="/images/image2.jpg" alt="Image" width="100" height="100" />
        <img border="0" src="/images/image3.jpg" alt="Image" width="100" height="100" />
        ';
    
    $get_Img_Src = '/<img[^>]*src=([\'"])(?<src>.+?)\1[^>]*>/i'; //for get img src path only...
    
    preg_match_all($get_Img_Src, $html, $result); 
    if (!empty($result)) {
        echo $result['src'][0];
        echo $result['src'][1];
    }

for get img src path & alt text also then use below regex instead of above...

<img[^>]*src=(['"])(?.+?)\1[^>]alt=(['"])(?.+?)\2>

    $get_Img_Src = '/<img[^>]*src=([\'"])(?<src>.+?)\1[^>]*alt=([\'"])(?<alt>.+?)\2*>/i'; //for get img src path & alt text also
    
    preg_match_all($get_Img_Src, $html, $result); 
    if (!empty($result)) {
        echo $result['src'][0];
        echo $result['src'][1];
        echo $result['alt'][0];
        echo $result['alt'][1];
    }

I got idea of that great solution from here, PHP extract link from a href tag

For Extract Urls of specific domains only then try below regex

// for e.g. if you need to extract onlt urls of "test.com" 
// then you can do it as like below regex

<a[^>]+href=([\'"])(?<href>(https?:\/\/)?test\.com.* ?)\1[^>]*>

yes but If we wants to do validate the form data or manipulate the html string, then we can use regex for abstract. I used above regex in my project. so that why I share unique regex solution for abstract src path — Harsh Patel, Oct 26 '21 at 13:05

score 0 · Answer 9 · answered Jan 22 '10 at 21:53

0

try this pattern:

'/< \s* img [^\>]* src \s* = \s* [\""\']? ( [^\""\'\s>]* )/'

answered Jan 22 '10 at 21:53

user256058

115
2

This won't work if img is capitalized or it the title contains a '>'. It would be more robust to use an HTML parser. – Mark Byers Jan 22 '10 at 21:57

score 0 · Answer 10 · answered Jan 24 '21 at 01:37

I use preg_match_all to capture all images in HTML document:

preg_match_all("~<img.*src\s*=\s*[\"']([^\"']+)[\"'][^>]*>~i", $body, $matches);

This one allows more relaxed syntax of declaration, with spaces and different quote types.

Regex reads like <img (any attributes like style or border) src (possible space) = (possible space) (' or ") (any non-quote symbol) (' or ") (anything until >) (>)

score -1 · Answer 11 · answered Aug 01 '16 at 12:24

lets assume i use

$text ='<img src="blabla.jpg" alt="blabla" />';

in

getTextBetween('src="','"',$text);

the codes will return :

blabla.jpg" alt="blabla"

which is wrong, we want the codes to return the text between the attribute value quotes i.e attr = "value".

so

  function getTextBetween($start, $end, $text)
            {
                // explode the start string
                $first_strip= end(explode($start,$text,2));

                // explode the end string
                $final_strip = explode($end,$first_strip)[0];
                return $final_strip;
            }

does the trick!.

Try

   getTextBetween('src="','"',$text);

will return:

blabla.jpg

Thanks all the same , because your solution gave me an insight to the final solution .

I do not really want to say your approach is bad, but I do think using domdocument would be a much better solution to this question. ref this for example: http://stackoverflow.com/questions/6441448/how-do-i-get-the-src-attribute-of-img-tags — Abela, Aug 01 '16 at 12:52
domdocument library is too heavy to use for such a simple task. thats like using a bulldozer to crush a snake when you have a cutlass alternative. — Victor Oni, Oct 11 '16 at 08:34

Regex & PHP - isolate src attribute from img tag

11 Answers11

Code

Output

Linked