0

When we have:

<img src="http://example.com/myimageurl.jpg" />
<img class="my-image-class" src="http://example.com/myimageurl2.jpg" />

With grep:

grep -Po '(?<=src=")[^"]*' filename

I goth both image urls.

Actually I need only the url of the image with the specific class "my-image-class".

How can I achieve that?

Thanks in advance!

Peter
  • 1
  • 3

2 Answers2

0

You shouldn't parse html with regex, but if you are looking for a quick grep
regex, this works.

<img\s+(?=[^>]*?(?<=\s)class\s*=\s*"my-image-class")[^>]*?(?<=\ssrc=")([^"]*)

Expanded

 <img \s+ 
 (?=
      [^>]*? 
      (?<= \s )
      class \s* = \s* "my-image-class"
 )
 [^>]*? 
 (?<= \s src=" )
 ( [^"]* )                     # (1)

Output

 **  Grp 0 -  ( pos 49 , len 67 ) 
<img class="my-image-class" src="http://example.com/myimageurl2.jpg  
 **  Grp 1 -  ( pos 82 , len 34 ) 
http://example.com/myimageurl2.jpg  
  • sln, I am actually writing this as a shell script file so I will stick with grep. Your suggestion works partly because I need only the URL http://example.com/myimageurl2.jpg, not the whole IMG tag. – Peter Feb 15 '16 at 00:30
  • @Peter - Doesn't grep give you the whole line, or is it just the match? Well, the bad news is you can't match just the jpeg when finding the class. 20 years of experience tells me this to be the case. Btw, the jpeg is in capture group 1. Can't you get that without grep? An option is if the Perl it uses supports the `\K`, then you could use `]*?(?<=\s)class\s*=\s*"my-image-class")[^>]*?(?<=\ssrc=")\K[^"]*` –  Feb 15 '16 at 00:33
  • Thank you very much, sln! This worked perfectly! You saved me a lot of time and efforts! – Peter Feb 15 '16 at 00:52
0

Not sure what your use case is here but you could easily do this by pasting your HTML in a site like http://jsbin.com and writing a few lines of jQuery:

var imgs = [];

$('img').each( function() {
  var $img = $(this);

  if( $img.hasClass('my-image-class') ) {
    imgs.push($img.attr('src'));
  }
});

console.log(imgs);

Demo: http://jsbin.com/cicoli/edit?html,js,console,output

Ted Whitehead
  • 1,731
  • 10
  • 18