I have a specific case (prefer not to use Cheerio or jsdom) and need a flexible regular expression that will find all relative paths for images—even those with invalid markup. I have two issues which can be seen at http://www.regexr.com/3bkil.
- Not capture the beginning single and double quotes. "'
- Not capture those that are missing a less than symbol. <
Here is what I have so far...
(?!(\<\s*img [^\>]*src\s*=\s*))[\"\']\s*[\w\.\-\/]+(\.(png|jpg|jpeg|gif))(?=(.(\"|\')|(\"|\')))
Almost there. Here are the test cases.
TO MATCH:
<img src="images/vendor.png" alt="" > <img src="images/vendor.gif" class="box-bg-image" alt="" >
<img src="images/vendor-dp-20141009-flatware.jpg" class="box-bg-image" alt="" >
<img src="images/vendor-flatware.jpeg" class="box" alt="" >
<img src='images/vendor-flatware.jpeg' class="box" >
<img alt="" src= 'images/vendor-flatware.jpeg' alt="" >
<img src=' images/vendor-flatware.jpg' alt="" >
<img src=' images/vendor-flatware.gif' alt="" >
<img src=' images/vendor-flatware.png ' alt="" >
<img src='../silverware.png' alt="" >
<img class="box" src='images/vendor-watch.png' alt="" >
<img src=" images/vendor-flatware.jpeg " alt="" >
< img src="images/vendor-flatware.jpeg " alt="" >
< img src="images/vendor-flatware.jpeg " alt="" >
<img src="vendor.gif" alt="">
NOT TO MATCH:
<img src="http://thirdpartycdn.com/image.jpg">
<img src='http://thirdpartycdn.com/image.png'>
<img src="http://thirdpartycdn.com/image.gif" class="box-bg-image" alt="">
img src="images/vendor-flatware.jpeg "
<img src="images/vendorpng" alt="" >
Any help would be appreciated!