0

How do I get all images path from a string? Note I just want the path containing the word "media".

For example given this string (part of the DOM)

<div class="my-class">
   <img src="http://my-website.com/cache/media/2017/10/img67.jpeg" class="" alt="test" width="120" height="100">
   <img src="http://my-website.com/cache/2017/10/img68.png" class="" alt="test" width="120" height="100">
   <img src="http://my-website.com/cache/media/2017/10/img69.jpg" class="" alt="test" width="120" height="100">
   <h2 class="uk-margin-top-remove">About us</h2>                
</div>

I want an array containing a similar result:

array(
  [0] => "http://my-website.com/cache/media/2017/10/img67.png"
  [1] => "http://my-website.com/cache/media/2017/10/img69.png"
);

I don't want the second img because src attribute doesn't contain the word "media".

splunk
  • 6,435
  • 17
  • 58
  • 105
  • please state your current attempt – Rotimi Jan 25 '18 at 18:56
  • You have been a member long enough to know you should include code. – Andreas Jan 25 '18 at 18:56
  • Dig into DOMDocument object, and try something out. Learning is best by attempting! You'll also stumble across all kinds of fun new things while you try. ... and you have way too many questions without accepted answers. – IncredibleHat Jan 25 '18 at 18:57
  • 2
    Don't use regex for this. Use a parser: [H̸̡̪̯ͨ͊̽̅̾̎Ȩ̬̩̾͛ͪ̈́̀́͘ ̶̧̨̱̹̭̯ͧ̾ͬC̷̙̲̝͖ͭ̏ͥͮ͟Oͮ͏̮̪̝͍M̲̖͊̒ͪͩͬ̚̚͜Ȇ̴̟̟͙̞ͩ͌͝S̨̥̫͎̭ͯ̿̔̀ͅ](https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags) – ctwheels Jan 25 '18 at 18:58
  • Doing a simple search like. `php get src from img` gives you lots of answers. Then you can easily tailor it to your issue. FYI `regex` is not ideal here. `DOMDocument()` is right – Rotimi Jan 25 '18 at 18:59
  • I would use a parser but I can't in this particular case. – splunk Jan 25 '18 at 19:02
  • @ctwheels: I could say the same with the pony link but I wouldn't dare... – Jan Jan 25 '18 at 19:05

2 Answers2

2

You could use preg_match_all() to get URLs but it is even better to use a DOM reader.

$str = '<div class="my-class">
   <img src="http://my-website.com/cache/media/2017/10/img67.jpeg" class="" alt="test" width="120" height="100">
   <img src="http://my-website.com/cache/2017/10/img68.png" class="" alt="test" width="120" height="100">
   <img src="http://my-website.com/cache/media/2017/10/img69.jpg" class="" alt="test" width="120" height="100">
   <h2 class="uk-margin-top-remove">About us</h2>                
</div>' ;

$matches = [] ;
preg_match_all('~(http\://my-website\.com/cache/media/(.*?))"~i', $str, $matches) ;
var_dump($matches[1]);

Will returns :

array(2) {
  [0]=>
  string(52) "http://my-website.com/cache/media/2017/10/img67.jpeg"
  [1]=>
  string(51) "http://my-website.com/cache/media/2017/10/img69.jpg"
}
Syscall
  • 19,327
  • 10
  • 37
  • 52
1

Some boilerplate code to get you started:

<?php

$data = <<<DATA
<div class="my-class">
   <img src="http://my-website.com/cache/media/2017/10/img67.jpeg" class="" alt="test" width="120" height="100">
   <img src="http://my-website.com/cache/2017/10/img68.png" class="" alt="test" width="120" height="100">
   <img src="http://my-website.com/cache/media/2017/10/img69.jpg" class="" alt="test" width="120" height="100">
   <h2 class="uk-margin-top-remove">About us</h2>                
</div>
DATA;

# set up the dom
$dom = new DOMDocument();
$dom->loadHTML($data, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);# | LIBXML_COMPACT | LIBXML_NOENT );

# set up the xpath
$xpath = new DOMXPath($dom);

foreach ($xpath->query("//img[contains(@src, '/media/')]/@src") as $image) {
    echo $image->nodeValue . "\n";
}

Which yields

http://my-website.com/cache/media/2017/10/img67.jpeg
http://my-website.com/cache/media/2017/10/img69.jpg


This loads the DOM and uses an xpath query for every image where we'll loop over afterwards.
If for some reasons (why?) you are unable to use a DOM parser, your could use the secondbest option:
<img
(?s:(?!>).)+?
src=(['"])
(?P<src>(?:(?!\1).)+?/media/.*?\1)

And use the src group, see a demo on regex101.com.

Jan
  • 42,290
  • 8
  • 54
  • 79