-3

I have this regex:

(?=<a .*href=")(.+)(?=".*>My Text<\/a>)

With this, I try to extract href value from an specific HTML tag <a> from an HTML.

Let's say I have this HTML:

<html>
<head>
    ...
</head>
<body>
    ...
     <a class="..." href="..." ..="..">My Text</a>
    ...
</body>
</html>

With my regex I get <a class="..." href="..." ..="..(stop before ">), but I want only href value.

Edit: this answer: regular expression for finding 'href' value of a <a> link don't help me. With that regex I get all tags <a> with all attributes of tag.

KunLun
  • 3,109
  • 3
  • 18
  • 65

2 Answers2

2

Consider using an HTML parser instead. Regex often isn't powerful enough to parse HTML. For the example you posted, and fairly limited variations of it, the following should work:

<a[\s\S]*?href="([^"]+)"[\s\S]*?>

Demo

Nick Reed
  • 4,989
  • 4
  • 17
  • 37
1

You can use this regex to locate the link inside the href attribute :

Regex :

<a .*? `href="(.*?)"`.*?>(?>.*?<\/a>)

Explanation :

.*? ==> anything with non-greedy markup

href="(.*?)" ==> the captured group

(?>.*?<\/a>) ==> loop-ahead for the closed tag

Demo : Here

lagripe
  • 766
  • 6
  • 18