22

I'm trying to match all the images elements as strings,

This is my regex:

html.match(/<img[^>]+src="http([^">]+)/g);

This works, but I want to extract the src of all the images. So when I execute the regular expression on this String:

<img src="http://static2.ccn.com/ccs/2013/02/img_example.jpg />

it returns:

"http://static2.ccn.com/ccs/2013/02/img_example.jpg"

Default
  • 16,020
  • 3
  • 24
  • 38
  • 5
    Don't use regex to parse html. – Evan Davis Feb 18 '13 at 15:06
  • I have to do with regex –  Feb 18 '13 at 15:08
  • 4
    @Tomirammstein, why do you have to do it with a regex when Javascript has DOM built in? –  Feb 18 '13 at 15:09
  • @Tomirammstein In which environment is your JavaScript code executing? If it's a web-browsers, just parse the HTML string into a DOM tree. – Šime Vidas Feb 18 '13 at 15:09
  • Too bad... with JQuery it would be `$('img[src="http://static2.ccn.com/ccs"]').each(function(){});` – sdespont Feb 18 '13 at 15:10
  • @dan1111 Not exactly. JavaScript is just a scripting language. The DOM is not built-in in *web-browsers*, not JavaScript. – Šime Vidas Feb 18 '13 at 15:12
  • 1
    I'm using node.js, so, I can't parse it into an HTML tree –  Feb 18 '13 at 15:15
  • https://github.com/harryf/node-soupselect maybe this could help – VoronoiPotato Feb 18 '13 at 15:17
  • 2
    @Tomirammstein Check this out: http://stackoverflow.com/questions/7977945/html-parser-on-nodejs – Šime Vidas Feb 18 '13 at 15:17
  • 2
    @Tomirammstein Don't you think it would've been helpful to tag this question as `node.js` in the first place? – Ian Feb 18 '13 at 15:20
  • Don't you think that node.js it's based on Javscript? –  Feb 18 '13 at 15:44
  • Yes but they aren't the same. You said it yourself, node.js is **based** on Javascript - it doesn't include everything and isn't perfectly identical. I'm just saying, tagging it correctly and explaining it better could've helped get a more direct and correct solution, faster. – Ian Feb 20 '13 at 02:58
  • this regx is not working incase we have entire html as a string and i want to find out the image url out of it. Anyone can you help ? https://stackoverflow.com/questions/57883657/find-if-content-has-current-website-url-in-javascript – appsntech Sep 11 '19 at 15:19

6 Answers6

29

You need to use a capture group () to extract the urls, and if you're wanting to match globally g, i.e. more than once, when using capture groups, you need to use exec in a loop (match ignores capture groups when matching globally).

For example

var m,
    urls = [], 
    str = '<img src="http://site.org/one.jpg />\n <img src="http://site.org/two.jpg />',
    rex = /<img[^>]+src="?([^"\s]+)"?\s*\/>/g;

while ( m = rex.exec( str ) ) {
    urls.push( m[1] );
}

console.log( urls ); 
// [ "http://site.org/one.jpg", "http://site.org/two.jpg" ]
MikeM
  • 13,156
  • 2
  • 34
  • 47
  • 2
    Ended up with this instead. Otherwise, it doesn't pick up all images. /]+src="([^">]+)/g – juminoz May 22 '13 at 01:44
  • 3
    some times img tag may have height or some other attr after "src" attr.So regex should be rex = /]+src="?([^"\s]+)"?[^>]*\/>/g; – S B Jan 15 '14 at 06:47
  • 7
    seems that this regex not works on all img tags, but this works /]*\/([^">]*?))".*?>/g; – norman784 Sep 22 '14 at 20:21
  • this regx is not working incase we have entire html as a string and i want to find out the image url out of it. Can you help ? https://stackoverflow.com/questions/57883657/find-if-content-has-current-website-url-in-javascript – appsntech Sep 11 '19 at 15:21
8
var myRegex = /<img[^>]+src="(http:\/\/[^">]+)"/g;
var test = '<img src="http://static2.ccn.com/ccs/2013/02/CC_1935770_challenge_accepted_pack_x3_indivisible.jpg" />';
myRegex.exec(test);
aleph_null
  • 5,766
  • 2
  • 24
  • 39
  • Thank you for your answer. It helped me. I just want to add this: `var src = myRegex.exec(test); console.log('SRC: ' + src[1]);` – akelec May 14 '16 at 20:29
  • this regx is not working incase we have entire html as a string and i want to find out the image url out of it. Can you help ? https://stackoverflow.com/questions/57883657/find-if-content-has-current-website-url-in-javascript – appsntech Sep 11 '19 at 15:18
7

As Mathletics mentioned in a comment, there are other more straightforward ways to retrieve the src attribute from your <img> tags such as retrieving a reference to the DOM node via id, name, class, etc. and then just using your reference to extract the information you need. If you need to do this for all of your <img> elements, you can do something like this:

var imageTags = document.getElementsByTagName("img"); // Returns array of <img> DOM nodes
var sources = [];
for (var i in imageTags) {
   var src = imageTags[i].src;
   sources.push(src);
}

However, if you have some restriction forcing you to use regex, then the other answers provided will work just fine.

Default
  • 16,020
  • 3
  • 24
  • 38
2

Perhaps this is what you are looking for:

What I did is slightly modified your regex then used the exec function to get array of matched strings. if you have more then 1 match the other matches will be on results[2], results[3]...

var html = '<img src="http://static2.ccn.com/ccs/2013/02/CC_1935770_challenge_accepted_pack_x3_indivisible.jpg" />';

var re = /<img[^>]+src="http:\/\/([^">]+)/g
var results = re.exec(html);

var source = results[1];
alert(source);
Stasel
  • 1,298
  • 1
  • 13
  • 26
1

You can use an html parser and avoid regexp at all.

var parser = require('node-html-parser');

var html = '<img src="http://static2.ccn.com/ccs/2013/02/CC_1935770_challenge_accepted_pack_x3_indivisible.jpg" />'

parser.parse(html).querySelector('img').getAttribute('src')

=> 'http://static2.ccn.com/ccs/2013/02/CC_1935770_challenge_accepted_pack_x3_indivisible.jpg'
bejczib
  • 11
  • 2
  • Please provide additional details in your answer. As it's currently written, it's hard to understand your solution. – Community Aug 31 '21 at 14:37
-1

You can access the src value using groups

                                                   |->captured in group 1
                                   ----------------------------------                
var yourRegex=/<img[^>]+src\s*=\s*"(http://static2.ccn.com/ccs[^">]+)/g;
var match = yourRegex.exec(yourString);
alert(match[1]);//src value
Anirudha
  • 32,393
  • 7
  • 68
  • 89