0

I've got a bunch of strings already separated from an HTML file, examples:

<img alt="" src="//i.imgur.com/tApg8ebb.jpg" title="Some manly skills for you guys&lt;p&gt;&lt;span class='points-q7Vdm'&gt;18,736&lt;/span&gt;&nbsp;&lt;span class='points-text-q7Vdm'&gt;points&lt;/span&gt;  : 316,091 views&lt;/p&gt;">

<img src="//i.imgur.com/SwmwL4Gb.jpg" width="48" height="48">

<img src="//s.imgur.com/images/blog_rss.png">

I am trying to make a regular expression that will grab the src="URL" part of the img tag so that I can replace it later based on a few other conditions. The many instances of quotation marks are giving me the biggest problem, I'm still relatively new with Regex, so a lot of the tricks are out of my knowledge,

Thanks in advance

Andy Lester
  • 91,102
  • 13
  • 100
  • 152
EyeOfTheHawks
  • 576
  • 1
  • 5
  • 16
  • I think you didn't see one famous answer on RegEx+HTML http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 – nikis May 08 '14 at 18:48
  • 1
    **Don't use regular expressions to parse HTML. Use a proper HTML parsing module.** You cannot reliably parse HTML with regular expressions, and you will face sorrow and frustration down the road. As soon as the HTML changes from your expectations, your code will be broken. See http://htmlparsing.com/php or [this SO thread](http://stackoverflow.com/questions/3577641/how-do-you-parse-and-process-html-xml-in-php) for examples of how to properly parse HTML with PHP modules that have already been written, tested and debugged. – Andy Lester May 08 '14 at 18:49
  • 1
    Wow, I am using a simple_html_dom php file to do the exact same thing I am asking for. Brain fart to the max, thanks @AndyLester – EyeOfTheHawks May 08 '14 at 18:51

2 Answers2

2

Use DOM or another parser for this, don't try to parse HTML with regular expressions.

Example:

$html = <<<DATA
<img alt="" src="//i.imgur.com/tApg8ebb.jpg" title="Some manly skills for you guys&lt;p&gt;&lt;span class='points-q7Vdm'&gt;18,736&lt;/span&gt;&nbsp;&lt;span class='points-text-q7Vdm'&gt;points&lt;/span&gt;  : 316,091 views&lt;/p&gt;">
<img src="//i.imgur.com/SwmwL4Gb.jpg" width="48" height="48">
<img src="//s.imgur.com/images/blog_rss.png">
DATA;

$doc = new DOMDocument();
$doc->loadHTML($html); // load the html

$xpath = new DOMXPath($doc);
$imgs  = $xpath->query('//img');

foreach ($imgs as $img) {
   echo $img->getAttribute('src') . "\n";
}

Output

//i.imgur.com/tApg8ebb.jpg
//i.imgur.com/SwmwL4Gb.jpg
//s.imgur.com/images/blog_rss.png

If you would rather store the results in an array, you could do..

foreach ($imgs as $img) {
   $sources[] = $img->getAttribute('src');
}

print_r($sources);

Output

Array
(
    [0] => //i.imgur.com/tApg8ebb.jpg
    [1] => //i.imgur.com/SwmwL4Gb.jpg
    [2] => //s.imgur.com/images/blog_rss.png
 )
hwnd
  • 69,796
  • 4
  • 95
  • 132
-1
$pattern = '/<img.+src="([\w/\._\-]+)"/';

I'm not sure which language you're using, so quote syntax will vary.

mts7
  • 583
  • 4
  • 12