2

RSS reader returns

Submitted by
<a href="http://www.reddit.com/user/guiness_as_usual">
    guiness_as_usual
</a><br/>
<a href="https://www.spaceglasses.com/">
    [link]
</a>
<a href="http://www.reddit.com/r/technology/comments/1kmdom/meta_glasses_become_a_real_life_iron_man/">
    [242 comments]
</a>

What I have to do is catch 2nd and 3rd href attribute into 2 different variables. I have to do this in JavaScript. Does anyone have idea how to capture these 2 values using regexp JavaScript?

// EDIT I'm looking exactly for this but in Javascript http://rubular.com/r/ESRimQsZHc I want to be able to catch result[0], result[1] and result[2].

Filip Bartuzi
  • 5,711
  • 7
  • 54
  • 102
  • Have you tried it yourself before I give you an answer? – putvande Aug 19 '13 at 15:39
  • 1
    You are probably going to get a stream of "don't parse HTML with regex". That is generally good advice if you cannot guarantee the structure of your input. Are you absolutely certain that the RSS reader will always return data in exactly the structure you've posted? – JDB Aug 19 '13 at 15:40
  • This is a DOM fragment, you should probably be using DOM traversal methodologies to get at the values you seek (for example, jQuery would make this a very simple proposition). – Mike Brant Aug 19 '13 at 15:57

3 Answers3

1

You could use the DOMParser like so

var parser = new DOMParser();
var tempDoc = parser.parseFromString(htmlStr,"text/html");
var anchor2 = tempDoc.getElementsByTagName('a')[1];
var anchor3 = tempDoc.getElementsByTagName('a')[2];
var href2 = anchor2.getAttribute("href");//or anchor2.href; to get fully qualified link
var href3 = anchor3.getAttribute("href");//or anchor3.href; to get fully qualified link
MDEV
  • 10,730
  • 2
  • 33
  • 49
1

As you can read in the answers of this question, you can't parse HTML using a regular expression. In this answer, you'll read how to parse HTML in JavaScript. So, try this:

var el = document.createElement('div');
el.innerHTML = yourRssString;
var innerElements = el.getElementsByTagName('a');
var secondHref = innerElements[1].getAttribute('href');
var thirdHref = innerElements[2].getAttribute('href');
Community
  • 1
  • 1
ProgramFOX
  • 6,131
  • 11
  • 45
  • 51
1

If you absolutly need to use regexp. You can try this :

var text = 'submitted by <a href="http://www.reddit.com/user/guiness_as_usual"> guiness_as_usual </a> <br/> <a href="https://www.spaceglasses.com/">[link]</a> <a href="http://www.reddit.com/r/technology/comments/1kmdom/meta_glasses_become_a_real_life_iron_man/">[242 comments]</a>',
    hrefs = [],
    search = /href="([^"]+)"/g;
while(hreftmp = search.exec(text)) {
    hrefs.push(hreftmp);
}

document.write(hrefs[1]);
document.write(hrefs[2]);

It's simple and work with your exemple.

FlorianL
  • 31
  • 5
  • @user2686462: `[^"]` means: _all characters, except `"`_ – ProgramFOX Aug 19 '13 at 16:20
  • @user2686462 like ProgramFOX said : ["]+ means all consecutive characters like ". If we add ^, it means : all consecutive characters that are not ". We know that an href is contained between two ", so if we want the complete URL, we need to retain caracters that are not " between those ". Can you accept my answer if it satisfies your question ? – FlorianL Aug 20 '13 at 07:43