1

I want to get urls from a bing search. I get the html, and when I do this regex /<h2><a href="(.*?)"/g it gives me :

["<h2><a href="https://www.test.com/"", "<h2><a href="http://fr.wikipedia.org/wiki/Test_(informatique)"", "<h2><a href="http://www.speedtest.net/"", "<h2><a href="http://test.psychologies.com/"", "<h2><a href="http://www.thefreedictionary.com/test"", "<h2><a href="http://fr.wikipedia.org/wiki/Test"", "<h2><a href="http://www.wordreference.com/enfr/test"", "<h2><a href="http://www.sedecouvrir.fr/"", "<h2><a href="http://www.jeuxvideo.com/tests.htm"", "<h2><a href="http://en.wikipedia.org/wiki/Test""]

For js code, I used match

html.match(/<h2><a href="(.*?)"/g);

I only want the urls. The html is here: http://www.bing.com/search?q=test. I've already searched the whole day, and I think maybe I have to use group?

azizk
  • 13
  • 3

3 Answers3

1

Use Array.map to iterate over the list of html elements and then execute a given regular expression to get the link using group.

"use strict";

var links = ['<h2><a href="https://www.test.com/"',
 '<h2><a href="http://fr.wikipedia.org/wiki/Test_(informatique)"', 
 '<h2><a href="http://www.speedtest.net/"', 
 '<h2><a href="http://test.psychologies.com/"',
 '<h2><a href="http://www.thefreedictionary.com/test"',
 '<h2><a href="http://fr.wikipedia.org/wiki/Test"',
 '<h2><a href="http://www.wordreference.com/enfr/test"',
 '<h2><a href="http://www.sedecouvrir.fr/"',
 '<h2><a href="http://www.jeuxvideo.com/tests.htm"',
 '<h2><a href="http://en.wikipedia.org/wiki/Test"'];

var result = links.map(function (link) {
  return /<h2><a href="(.*?)"/.exec(link)[1];
});

console.log(result);
Kelsadita
  • 1,038
  • 8
  • 21
  • The `g` flag in `/g` is not needed there. `/g` is for multiple matches. You're iterating over an array list of items guaranteed to provide only a single match. – JayC Dec 20 '14 at 16:08
0

That is an array. You need something like this. Also you need groups.

var urls = html.map(function(str){
   return str.replace(/.*href="([^"]+).*/, "$1");
});
Amit Joki
  • 58,320
  • 7
  • 77
  • 95
0

If this is being done within a browser, there's really no need to try to use a regex.

var myNodeList= document.getElementsByTagName('a'); 
var i;
for (var i = 0; i < myNodeList.length; ++i) {
    var anchor = myNodeList[i];  
    console.debug(anchor.href);
}

But as hinted in the comments, if you really want to use regexes, all you need to do is iterate over the results like you see in How can I match multiple occurrences with a regex in JavaScript similar to PHP's preg_match_all()? In particular, note the lines:

while (match = re.exec(url)) {
     params[decode(match[1])] = decode(match[2]);
}
Community
  • 1
  • 1
JayC
  • 7,053
  • 2
  • 25
  • 41