Match zero or more times not working properly for zero matches

Question

I'm trying to get the URL and link text from these 2 types of URLs:

<a href="http://www.example.com">Example</a>
<a href="http://www.example.com" rel="nofollow">Example</a>

At first I had this:

text = text.replace(/<a href="(.*)">(.*)<\/a>/gim, "[$2]($1)");

But that includes rel="nofollow" in $2 for the 2nd example. I changed it to:

text = text.replace(/<a href="(.*)"( rel=".*"{0,})>(.*)<\/a>/gim, "[$3]($1)");

Now, the rel="nofollow" link is perfect, but the first example isn't matched at all.

{0,} should mean "match rel=".*" 0 or more times".

What am I doing wrong?

i think you should use jQuery or something like this, rather than rewriting a DOM parser... — pataluc, Jul 01 '13 at 12:04
@pataluc jQuery is seriously overkill for something as simple as this. I know the format of the URLs is either one of the two examples. It won't contain any other attributes. — Danny Beckett, Jul 01 '13 at 12:05
I assume that giving your example you'll have further use of jQuery... ^^ my bad. — pataluc, Jul 01 '13 at 12:12

score 3 · Accepted Answer · edited Jul 01 '13 at 22:34

3

Your expression says "find a quote zero or more times ;)

Use that:

text = text.replace(/<a href="([^"]*)"[^>]*>(.*?)<\/a>/gim, "[$3]($1)");

edited Jul 01 '13 at 22:34

Danny Beckett

20,529
24
107
134

answered Jul 01 '13 at 12:02

Marvin Emil Brach

3,984
1
32
62

OUCH... updated the code, try again... now group 1 matches all chars but the quote itself ;) – Marvin Emil Brach Jul 01 '13 at 12:18
I added non-greedy matching so that the expression still works if there is more than one URL in `text`. – Danny Beckett Jul 01 '13 at 22:35

000 · Answer 2 · 2013-07-01T12:14:07.297

1

jQuery solution:

var text_html_string = '<a href="http://www.example.com">Example</a>';
$(text_html_string).attr('href'); // http://www.example.com
$(text_html_string).text(); // Example

edit without jQuery

var d = document.createElement('div');
d.innerHTML = '<a href="http://www.example.com">Example</a>';
d.getElementsByTagName('a')[0].href; // http://www.example.com

edited Jul 01 '13 at 12:14

answered Jul 01 '13 at 12:04

000

26,951
10
71
101

2

I don't want to add a 237kb behemoth to a 4kb userscript. – Danny Beckett Jul 01 '13 at 12:06
1

Look at Zeptojs then. It is a **seriously** lightweight jquery alternative without as many bells and whistles. – 000 Jul 01 '13 at 12:08
1

For the sake of a simple RegEx change, it's definitely overkill to integrate any library (did I mention I hate frameworks? `:p`) – Danny Beckett Jul 01 '13 at 12:09
Alright how about that. I updated with a native javascript alternative that won't break every time your html changes. – 000 Jul 01 '13 at 12:14
That's actually pretty nice. Why I hadn't thought of this, I don't know. Good job! – Danny Beckett Jul 01 '13 at 12:15
creating one or two DOM objects for subtracting a portion of a string? O.K. it works but elegant it is not ;) – Marvin Emil Brach Jul 01 '13 at 12:21
@MarvinEmilBrach I actually think this is a better solution, since it will always work, regardless of which attributes a URL has. I accepted your answer though, since it's directly fixes the RegEx problem, and I know which attributes this URL will have. – Danny Beckett Jul 01 '13 at 12:25
I just didn't want to give the Y part of this X-Y Problem any legitimacy: http://stackoverflow.com/a/1732454/1253312 – 000 Jul 01 '13 at 12:27
]*href="([^"]*)"[^>]*>(.*)<\/a> works in all cases, too. – Marvin Emil Brach Jul 01 '13 at 12:28
It never hits the DOM. always stays in memory and doesn't cause a re-render. Actually pretty cheap operation. – 000 Jul 01 '13 at 16:43
Wasn't meant that it causes a rerender, as you say it will never get added to the model. But **pretty cheap** is not the cheapest ;) So I always try to omit object creation when not needed (I come from Java). But thinking twice, it could have a lesser impact in cause of the prototyping used in javascript... on that I will take a closer look by time. – Marvin Emil Brach Jul 02 '13 at 06:34

Match zero or more times not working properly for zero matches

2 Answers2