1

I'm trying to get the URL and link text from these 2 types of URLs:

<a href="http://www.example.com">Example</a>
<a href="http://www.example.com" rel="nofollow">Example</a>

At first I had this:

text = text.replace(/<a href="(.*)">(.*)<\/a>/gim, "[$2]($1)");

But that includes rel="nofollow" in $2 for the 2nd example. I changed it to:

text = text.replace(/<a href="(.*)"( rel=".*"{0,})>(.*)<\/a>/gim, "[$3]($1)");

Now, the rel="nofollow" link is perfect, but the first example isn't matched at all.

{0,} should mean "match rel=".*" 0 or more times".

What am I doing wrong?

Danny Beckett
  • 20,529
  • 24
  • 107
  • 134
  • 1
    i think you should use jQuery or something like this, rather than rewriting a DOM parser... – pataluc Jul 01 '13 at 12:04
  • 1
    @pataluc jQuery is seriously overkill for something as simple as this. I know the format of the URLs is either one of the two examples. It won't contain any other attributes. – Danny Beckett Jul 01 '13 at 12:05
  • I assume that giving your example you'll have further use of jQuery... ^^ my bad. – pataluc Jul 01 '13 at 12:12

2 Answers2

3

Your expression says "find a quote zero or more times ;)

Use that:

text = text.replace(/<a href="([^"]*)"[^>]*>(.*?)<\/a>/gim, "[$3]($1)"); 
Danny Beckett
  • 20,529
  • 24
  • 107
  • 134
Marvin Emil Brach
  • 3,984
  • 1
  • 32
  • 62
1

jQuery solution:

var text_html_string = '<a href="http://www.example.com">Example</a>';
$(text_html_string).attr('href'); // http://www.example.com
$(text_html_string).text(); // Example

edit without jQuery

var d = document.createElement('div');
d.innerHTML = '<a href="http://www.example.com">Example</a>';
d.getElementsByTagName('a')[0].href; // http://www.example.com
000
  • 26,951
  • 10
  • 71
  • 101
  • 2
    I don't want to add a 237kb behemoth to a 4kb userscript. – Danny Beckett Jul 01 '13 at 12:06
  • 1
    Look at Zeptojs then. It is a **seriously** lightweight jquery alternative without as many bells and whistles. – 000 Jul 01 '13 at 12:08
  • 1
    For the sake of a simple RegEx change, it's definitely overkill to integrate any library (did I mention I hate frameworks? `:p`) – Danny Beckett Jul 01 '13 at 12:09
  • Alright how about that. I updated with a native javascript alternative that won't break every time your html changes. – 000 Jul 01 '13 at 12:14
  • That's actually pretty nice. Why I hadn't thought of this, I don't know. Good job! – Danny Beckett Jul 01 '13 at 12:15
  • creating one or two DOM objects for subtracting a portion of a string? O.K. it works but elegant it is not ;) – Marvin Emil Brach Jul 01 '13 at 12:21
  • @MarvinEmilBrach I actually think this is a better solution, since it will always work, regardless of which attributes a URL has. I accepted your answer though, since it's directly fixes the RegEx problem, and I know which attributes this URL will have. – Danny Beckett Jul 01 '13 at 12:25
  • I just didn't want to give the Y part of this X-Y Problem any legitimacy: http://stackoverflow.com/a/1732454/1253312 – 000 Jul 01 '13 at 12:27
  • ]*href="([^"]*)"[^>]*>(.*)<\/a> works in all cases, too. – Marvin Emil Brach Jul 01 '13 at 12:28
  • It never hits the DOM. always stays in memory and doesn't cause a re-render. Actually pretty cheap operation. – 000 Jul 01 '13 at 16:43
  • Wasn't meant that it causes a rerender, as you say it will never get added to the model. But **pretty cheap** is not the cheapest ;) So I always try to omit object creation when not needed (I come from Java). But thinking twice, it could have a lesser impact in cause of the prototyping used in javascript... on that I will take a closer look by time. – Marvin Emil Brach Jul 02 '13 at 06:34