Catching 3rd href in regexp javascript?

Question

RSS reader returns

Submitted by
<a href="http://www.reddit.com/user/guiness_as_usual">
    guiness_as_usual
</a><br/>
<a href="https://www.spaceglasses.com/">
    [link]
</a>
<a href="http://www.reddit.com/r/technology/comments/1kmdom/meta_glasses_become_a_real_life_iron_man/">
    [242 comments]
</a>

What I have to do is catch 2nd and 3rd href attribute into 2 different variables. I have to do this in JavaScript. Does anyone have idea how to capture these 2 values using regexp JavaScript?

// EDIT I'm looking exactly for this but in Javascript http://rubular.com/r/ESRimQsZHc I want to be able to catch result[0], result[1] and result[2].

You are probably going to get a stream of "don't parse HTML with regex". That is generally good advice if you cannot guarantee the structure of your input. Are you absolutely certain that the RSS reader will always return data in exactly the structure you've posted? — JDB, Aug 19 '13 at 15:40
This is a DOM fragment, you should probably be using DOM traversal methodologies to get at the values you seek (for example, jQuery would make this a very simple proposition). — Mike Brant, Aug 19 '13 at 15:57

MDEV · Answer 1 · 2013-08-20T08:03:05.900

1

You could use the DOMParser like so

var parser = new DOMParser();
var tempDoc = parser.parseFromString(htmlStr,"text/html");
var anchor2 = tempDoc.getElementsByTagName('a')[1];
var anchor3 = tempDoc.getElementsByTagName('a')[2];
var href2 = anchor2.getAttribute("href");//or anchor2.href; to get fully qualified link
var href3 = anchor3.getAttribute("href");//or anchor3.href; to get fully qualified link

edited Aug 20 '13 at 08:03

answered Aug 19 '13 at 15:45

MDEV

10,730
2
33
49

score 1 · Accepted Answer · edited May 23 '17 at 10:31

As you can read in the answers of this question, you can't parse HTML using a regular expression. In this answer, you'll read how to parse HTML in JavaScript. So, try this:

var el = document.createElement('div');
el.innerHTML = yourRssString;
var innerElements = el.getElementsByTagName('a');
var secondHref = innerElements[1].getAttribute('href');
var thirdHref = innerElements[2].getAttribute('href');

score 1 · Answer 3 · answered Aug 19 '13 at 15:54

1

If you absolutly need to use regexp. You can try this :

var text = 'submitted by <a href="http://www.reddit.com/user/guiness_as_usual"> guiness_as_usual </a> <br/> <a href="https://www.spaceglasses.com/">[link]</a> <a href="http://www.reddit.com/r/technology/comments/1kmdom/meta_glasses_become_a_real_life_iron_man/">[242 comments]</a>',
    hrefs = [],
    search = /href="([^"]+)"/g;
while(hreftmp = search.exec(text)) {
    hrefs.push(hreftmp);
}

document.write(hrefs[1]);
document.write(hrefs[2]);

It's simple and work with your exemple.

answered Aug 19 '13 at 15:54

FlorianL

31
5

@user2686462: `[^"]` means: _all characters, except `"`_ – ProgramFOX Aug 19 '13 at 16:20
@user2686462 like ProgramFOX said : ["]+ means all consecutive characters like ". If we add ^, it means : all consecutive characters that are not ". We know that an href is contained between two ", so if we want the complete URL, we need to retain caracters that are not " between those ". Can you accept my answer if it satisfies your question ? – FlorianL Aug 20 '13 at 07:43

Catching 3rd href in regexp javascript?

3 Answers3