Matching SRC attribute of IMG tag using preg_match

Question

I'm attempting to run preg_match to extract the SRC attribute from the first IMG tag in an article (in this case, stored in $row->introtext).

preg_match('/\< *[img][^\>]*[src] *= *[\"\']{0,1}([^\"\']*)/i', $row->introtext, $matches);

Instead of getting something like

images/stories/otakuzoku1.jpg

from

<img src="images/stories/otakuzoku1.jpg" border="0" alt="Inside Otakuzoku's store" />

I get just

The regex should be right, but I can't tell why it appears to be matching the border attribute and not the src attribute.

Alternatively, if you've had the patience to read this far without skipping straight to the reply field and typing 'use a HTML/XML parser', can a good tutorial for one be recommended as I'm having trouble finding one at all that's applicable to PHP 4.

PHP 4.4.7

score 46 · Accepted Answer · answered Feb 01 '10 at 21:45

46

Your expression is incorrect. Try:

preg_match('/< *img[^>]*src *= *["\']?([^"\']*)/i', $row->introtext, $matches);

Note the removal of brackets around img and src and some other cleanups.

answered Feb 01 '10 at 21:45

CalebD

4,962
24
16

1

This did the trick. Not the 'ideal' solution of actually parsing the HTML, but the one solution that works and gives the neccessary result. Thanks! – KyokoHunter Feb 07 '10 at 00:56
2

as a side note, $matches[0] contains the full IMG tag, and $matches[1] contains the source URI. – Talvi Watia Dec 17 '12 at 18:18

GZipp · Answer 2 · 2010-02-01T22:36:26.673

5

Here's a way to do it with built-in functions (php >= 4):

$parser = xml_parser_create();
xml_parse_into_struct($parser, $html, $values);
foreach ($values as $key => $val) {
    if ($val['tag'] == 'IMG') {
        $first_src = $val['attributes']['SRC'];
        break;
    }
}

echo $first_src;  // images/stories/otakuzoku1.jpg

edited Feb 01 '10 at 22:36

answered Feb 01 '10 at 22:23

GZipp

5,386
1
22
18

score 4 · Answer 3 · answered Jun 28 '13 at 18:09

4

If you need to use preg_match() itself, try this:

 preg_match('/(?<!_)src=([\'"])?(.*?)\\1/',$content, $matches);

answered Jun 28 '13 at 18:09

Ajmal Salim

4,142
2
33
41

score 2 · Answer 4 · answered Feb 01 '10 at 21:50

2

Try:

include ("htmlparser.inc"); // from: http://php-html.sourceforge.net/

$html = 'bla <img src="images/stories/otakuzoku1.jpg" border="0" alt="Inside Otakuzoku\'s store" /> noise <img src="das" /> foo';

$parser = new HtmlParser($html);

while($parser->parse()) {
    if($parser->iNodeName == 'img') {
        echo $parser->iNodeAttributes['src'];
        break;
    }
}

which will produce:

images/stories/otakuzoku1.jpg

It should work with PHP 4.x.

answered Feb 01 '10 at 21:50

Bart Kiers

166,582
36
299
288

Some problems getting htmlparser.inc to work. Error message says the class is already initiated, but it isn't. I'll hold out for a provider upgrade to PHP 5... – KyokoHunter Feb 07 '10 at 00:58
Have you tried `include_once('htmlparser.inc');` instead of `include('htmlparser.inc');`? – Bart Kiers Feb 07 '10 at 07:29

score 1 · Answer 5 · edited May 23 '17 at 10:30

1

The regex I used was much simpler. My code assumes that the string being passed to it contains exactly one img tag with no other markup:

$pattern = '/src="([^"]*)"/';

See my answer here for more info: How to extract img src, title and alt from html using php?

edited May 23 '17 at 10:30

Community

1
1

answered Sep 28 '10 at 17:08

WNRosenberg

1,862
5
22
31

"exactly one img tag with no other markup"? That's a pretty specific case isn't it, maybe a bit too specific for almost everyone :[ – Andrew Dec 23 '15 at 17:18

mickmackusa · Answer 6 · 2019-05-15T13:37:14.823

This task should be executed by a dom parser because regex is dom-ignorant.

Code: (Demo)

$row = (object)['introtext' => '<div>test</div><img src="source1"><p>text</p><img src="source2"><br>'];

$dom = new DOMDocument();
$dom->loadHTML($row->introtext);
echo $dom->getElementsByTagName('img')->item(0)->getAttribute('src');

Output:

source1

This says:

Parse the whole html string
Isolate all of the img tags
Isolate the first img tag
Isolate its src attribute value

Clean, appropriate, easy to read and manage.

Matching SRC attribute of IMG tag using preg_match

6 Answers6

Linked

Related