20

I need a regex to find the contents of the hrefs from these a tags :

<p class="bc_shirt_delete">
   <a href="/CustomContentProcess.aspx?CCID=13524&amp;OID=3936923&amp;A=Delete" onclick="javascript:return confirm('Are You sure you want to delete this item?')">delete</a>
</p>

Just the urls, not the href/ tags.

I'm parsing a plain text ajax request here, so I need a regex.

L84
  • 45,514
  • 58
  • 177
  • 257
Infra Stank
  • 346
  • 1
  • 2
  • 15

10 Answers10

25

You can try this regex:

/href="([^\'\"]+)/g

Example at: http://regexr.com?333d1

Update: or easier via non greedy method:

/href="(.*?)"/g
Niels
  • 48,601
  • 4
  • 62
  • 81
9

This will do it nicely. http://jsfiddle.net/grantk/cvBae/216/

Regex example: https://regex101.com/r/nLXheV/1

var str = '<p href="missme" class="test"><a href="/CustomContentProcess.aspx?CCID=13524&amp;OID=3936923&amp;A=Delete" onclick="">delete</a></p>'
    
var patt = /<a[^>]*href=["']([^"']*)["']/g;
while(match=patt.exec(str)){
  alert(match[1]);
}
gkiely
  • 2,987
  • 1
  • 23
  • 37
  • Works like charm, exactly what I was looking for. Thankyou – Ravi Dec 11 '20 at 13:24
  • `nopes` will fail, for example. Many other cases will fail as well. Sure, it works for the question; but just an advice for those who landed here searching for a regex. – Gogol Feb 24 '21 at 18:56
  • 1
    Thanks for pointing that out @Gogol. I've added an updated regex here: https://regex101.com/r/nLXheV/1 Let me know if there are any use cases that it might fail on. – gkiely Mar 01 '21 at 16:34
6

Here is a robust solution:

let href_regex = /<a([^>]*?)href\s*=\s*(['"])([^\2]*?)\2\1*>/i,
    link_text = '<a href="/another-article/">another article link</a>',
    href = link_text.replace ( href_regex , '$3' );

Coloured href RegEx from http://www.regexr.com

What it does:

  • detects a tags
  • lazy skips over other HTML attributes and groups (1) so you DRY
  • matches href attribute
  • takes in consideration possible whitespace around =
  • makes a group (2) of ' and " so you DRY
  • matches anything but group (1) and groups (3) it
  • matches the group (2) of ' and "
  • matches the group (1) (other attributes)
  • matches whatever else is there until closing the tag
  • set proper flags i ignore case
Community
  • 1
  • 1
jimasun
  • 604
  • 9
  • 12
4

You may don't need Regex to do that.

o = document.getElementsByTagName('a');
urls = Array();
for (i =0; i < o.length; i++){
   urls[i] = o[i].href;
}

If it is a plain text, you may insert it into a displayed non DOM element, i.e display: none, and then deal with it regularly in a way like I described.

SaidbakR
  • 13,303
  • 20
  • 101
  • 195
3

It might be easier to use jQuery

 var html = '<li><h2 class="saved_shirt_name">new shirt 1</h2><button class="edit_shirt">Edit Shirt</button><button class="delete_shirt" data-eq="0" data-href="/CustomContentProcess.aspx?CCID=13524&amp;OID=3936923&amp;A=Delete">Delete Shirt</button></li><li><h2 class="saved_shirt_name">new shirt 2</h2><button class="edit_shirt">Edit Shirt</button><button class="delete_shirt" data-eq="0" data-href="/CustomContentProcess.aspx?CCID=13524&amp;OID=3936924&amp;A=Delete">Delete Shirt</button></li><li><h2 class="saved_shirt_name">new shirt 3</h2><button class="edit_shirt">Edit Shirt</button><button class="delete_shirt" data-eq="0" data-href="/CustomContentProcess.aspx?CCID=13524&amp;OID=3936925&amp;A=Delete">Delete Shirt</button></li>';
$(html).find('[data-href]');

And iterate each node

UPDATE (because post updated)

Let html be your raw response

var matches = $(html).find('[href]');
var hrefs = [];
$.each(matches, function(i, el){ hrefs.push($(el).attr('href'));});
//hrefs is an array of matches
jermel
  • 2,326
  • 21
  • 19
1

I combined a few solutions around and came up with this (Tested in .NET):

(?<=href=[\'\"])([^\'\"]+)

Explanation:

(?<=) : look behind so it wont include these characters

[\'\"] : match both single and double quote

[^] : match everything else except the characters after '^' in here

+ : one or more occurrence of last character.

This works well and is not greedy with the quote as it would stop matching the moment it finds a quote

EBFE
  • 11
  • 1
  • Look behind is not supported in JS. Check http://www.regexr.com. See my answer, not as tidy and compact though. – jimasun Nov 30 '16 at 11:22
0
var str = "";

str += "<p class=\"bc_shirt_delete\">";
str += "<a href=\"/CustomContentProcess.aspx?CCID=13524&amp;OID=3936923&amp;A=Delete\" onclick=\"javascript:return confirm('Are You sure you want to delete this item?')\">delete</a>";
str += "</p>";

var matches = [];

str.replace(/href=("|')(.*?)("|')/g, function(a, b, match) {
  matches.push(match);
});

console.log(matches);

or if you don't care about the href:

var matches = str.match(/href=("|')(.*?)("|')/);

console.log(matches);
Andreas Louv
  • 46,145
  • 13
  • 104
  • 123
0

how about spaces around = ? this code will fix it:

var matches = str.match(/href( *)=( *)("|'*)(.*?)("|'*)( |>)/);
console.log(matches);
bummi
  • 27,123
  • 14
  • 62
  • 101
Alex
  • 39
  • 2
  • Good, but still needs some work. Check my answer on this page: http://stackoverflow.com/a/40887021/1867650 – jimasun Nov 30 '16 at 11:21
0

It's important to be non-greedy. And to cater for —matching— ' or "

test = "<a href="#" class="foo bar"> banana 
        <a href='http://google.de/foo?yes=1&no=2' data-href='foobar'/>"

test.replace(/href=(?:\'.*?\'|\".*?\")/gi,'');

disclaimer: The one thing it does not catch is html5 attribs data-href...

Frank N
  • 9,625
  • 4
  • 80
  • 110
0

In this specified case probably this is fastest pregmatch:

/f="([^"]*)/
  • gets ALL signs/characters (letters, numbers, newline signs etc.) form f=" to nearest next ", excluding it, flags for example /is are unnecesary, return null if empty

but if the source contains lots of other links, it will be necessary to determine that this is exactly the one you are looking for and here we can do it this way, just include in your pregmatch more of the source code, for example (of course its depend from source site code...)

/bc_shirt_delete">\s*<a href="([^"]*)