0

I have the following html content

<html><head>
<title>Simple</title>


</head>
<body>
<div id="Content" style="padding: 5px;">
<p><a href="http://confluence:8080/download/attachments/8618175/Text.txt?version=1&modificationDate=1484637732181">Text.txt</a><br/>
<span class="image-wrap" style=""><img src="http://confluence:8080/download/attachments/8618175/add-button-blue-hi.png?version=1&modificationDate=1484562338796" style="border: 1px solid black" /></span><br/>
<span class="image-wrap" style=""><a class="confluence-thumbnail-link 300x200" href='http://confluence:8080/download/attachments/8618175/attachment.jpg'><img src="http://confluence:8080/download/thumbnails/8618175/attachment.jpg" style="border: 1px solid black" /></a></span></p>
</div>
</body></html>

Here i have two <a> tags. I need the value of href of the <a> node where the attribute 'src' is that of the second <a> node i.e \"http://confluence:8080/download/thumbnails/8618175/attachment.jpg\" I have a variable say string x which contains this value and i need to get the the href of the <a> node based on this value.

Right now i am using "href\s*=\s*(?:\"(?<1>[^\"]*)\"|(?<1>\S+))" but it gives me the href value of all the nodes.

  • This is really a very good case to handle with [HtmlAgilityPack](http://www.codeplex.com/htmlagilitypack). Get all `a` tags that have `src` equal to your value using XPath, and then just get their href values. – Wiktor Stribiżew Jan 25 '17 at 14:32
  • Also: http://stackoverflow.com/questions/1496619/regex-to-get-the-link-in-href-asp-net?rq=1 – Magnetron Jan 25 '17 at 14:34

1 Answers1

0

I completely agree with Wiktor S here. I.e. HTML Agility Pack is a much more robust solution than Regex. But if you must use Regex, try this...

<a[^>]*href\s*=(?<HRef>[^>]+)>

Tested here: https://regex101.com/r/XuGjc5/1

Nick Allan
  • 387
  • 4
  • 10