3

From a string such as this <img src="/images/mylondon.jpg" /> I'm trying to retrieve JUST the url for use elsewhere in PHP

I know Regular expressions are the way to go, but I can't get my head around them right now.

Could anyone be of assistance?

shane
  • 852
  • 1
  • 8
  • 16
  • I've used the answer given below, which works, but is there a better way of doing this? It's not an entire document I'm searching through, just a couple of lines of HTML... – shane Sep 18 '11 at 14:22
  • "Regular expressions are the way to go" Somebody has been deceiving you. Regular expressions are only an acceptable way for regular languages. For the other languages, they can create massive problems. See also http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags – luiscubal Sep 18 '11 at 15:38

2 Answers2

7
preg_match_all('~<img.*?src=["\']+(.*?)["\']+~', $html, $urls);
$urls = $urls[1]
genesis
  • 50,477
  • 20
  • 96
  • 125
  • 3
    This regular expression won't work in a lot of situations. And as always, it's a bad idea to use regular expressions with HTML. – FtDRbwLXw6 Sep 18 '11 at 14:17
  • 1
    What would be the best way to grab the `url` from a line of `html` then? – shane Sep 18 '11 at 14:21
  • @shane A more maintainable way would be to use an HTML parser class. For example the [PHP Simple HTML DOM Parser](http://simplehtmldom.sourceforge.net/) – Alexander Sep 18 '11 at 15:00
  • @drrcknlsn: in this case it will work. could you tell me situation when would not regex work on this? – genesis Sep 18 '11 at 15:46
  • @genesis It will not match `` and other cases. It will also match things that it should not, like ``. @shane You should use a DOM parser. – FtDRbwLXw6 Sep 18 '11 at 17:11
  • @drrcknlsn: ` – genesis Sep 18 '11 at 17:11
  • @genesis It will also fail for valid HTML like `bar` because it will match `foo alt="bar`. – FtDRbwLXw6 Sep 18 '11 at 17:20
  • @drrcknlsn: tell me one single popular website which is not using quotes – genesis Sep 18 '11 at 17:21
  • @genesis It doesn't matter how unpopular it is. It is valid HTML and sites do use it, unfortunately, which makes this solution unreliable. It is better to use a DOM parser. – FtDRbwLXw6 Sep 18 '11 at 17:25
  • @drrcknlsn: Just a question OT: How does domdocument parse it? – genesis Sep 18 '11 at 17:27
  • let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/3574/discussion-between-genesis--and-drrcknlsn) – genesis Sep 18 '11 at 17:27
2

I think it would be better if used DOMDocument object:

$text = '<html><body><p>ala bala</p><img src="/images/mylondon.jpg" /></body></html>';
$htmlDom = new DOMDocument;
$htmlDom->loadHTML($text);

$imageTags = $htmlDom->getElementsByTagName('img');

$extractedImages = array();
foreach($imageTags as $imageTag){
   $extractedImages[] = $imageTag->getAttribute('src');
}

echo '<pre>'; var_dump($extractedImages); exit;
Pavel Kenarov
  • 944
  • 1
  • 9
  • 21