1

I'm searching to match all the content between 2 tag <a and </a>

My page is always the same,

<a class="applink" href="myLINK" target="..." onClick="..."><img src="..." border="0" alt="..." title="..." align=bottom hspace=3 width="32" height="32"><br>xxxxx</br></a>

A would like match all part of html code where code like this.

so <a class="applink" [...] </a> (!!!! with the tag <img for example --> no [^>]*)

Kobi
  • 135,331
  • 41
  • 252
  • 292
yrejk
  • 13
  • 3

4 Answers4

2

A better approach here is to use an HTML parser. For example, the Html Agility Pack:

HtmlWeb web = new HtmlWeb();
HtmlDocument doc = web.Load("http://jsbin.com/enico4/"); // this works!
HtmlNodeCollection links = doc.DocumentNode.SelectNodes("//a[@class='appLink']");

You can also get each link's HTML if you need it, but links is already the collection you need.

IEnumerable<string> appLinks = links.Select(link => link.InnerHtml);

(the code here is C#, but it should translate easily to VB.Net)

Kobi
  • 135,331
  • 41
  • 252
  • 292
  • Thank for you approach. Problem is that if a work with a webBrowser, i can't or i don't know how, joins my httprequest whith this webBrowser because cookies is loss and so WebBrowser ask again an authentification. Bue Thanks for your solution. I have well try an XML parser, but the page is not valid so i have many error :( – yrejk Dec 22 '10 at 14:36
  • @yrejk - I'm not sure I understand. You already have the HTML, don't you? You use the regex on it. The Agility Pack can take HTML from a string, you don't have to load it from a browser or a web request. – Kobi Dec 22 '10 at 15:18
  • @yrejk The point here is that you may instantiate the `HtmlDocument` with an HTML *string*, you do not even need to fetch the data from the Web again after you get the HTML data with your WebBrowser. – Wiktor Stribiżew Oct 13 '17 at 08:31
1

HTML parsing is a bit tricky with regex, but this should work for many cases:

<a\s.*?href\s*=\s*"appLink"[^>]*>(.|\s)*?</a>

This will match elements with an href="appLink".

You might want to consider using the .NET XML parsing code.

Jason
  • 9,408
  • 5
  • 36
  • 36
  • I want all the tag A with class="appLink" so if it's i want also the tag... And can you give me some info about XML parser class ? – yrejk Dec 22 '10 at 13:27
  • i have remote "new_line" so with ']+>(.*?)' it's ok... But i i want perform this to have only ...href="appl... or ...href="laun... ! Greats – yrejk Dec 22 '10 at 13:52
0

This should solve it for you: <a .*?</a>

This does NOT affect tags like <address>, <abbr>, ...

sjngm
  • 12,423
  • 14
  • 84
  • 114
-1
<a.*</a>

OR

<a class="applink".*</a>
Francis
  • 9
  • 1