0

I have an html file, that file contains html tags I want to select all anchors that have specific formula that formula like the following

<a href="AnyTextHereFollowingByThatChar/" target="_blank">

I write regex like that following

\<a\s*href\=\"(.*?)"\s*target\="_blank"

but this regex select the first anchor that it match until find keyword target on any other anchor and then stop after selecting all characters in between.

Appreciate any help to catch those anchors <a href="AnyTextHereFollowingByThatChar/" target="_blank">

Muhammad Hassan
  • 475
  • 2
  • 14
  • Regex for html is fought with problems, if this is nested in a larger html document id consider using http://html-agility-pack.net/ – TheGeneral Sep 05 '18 at 06:37
  • what is this html-agility-pack.net @TheGeneral – Muhammad Hassan Sep 05 '18 at 06:40
  • Its a dedicated *Jedi* html parsing library, and will make short work of your html dilemmas. Well... after an abrupt learning curve and several more SO questions – TheGeneral Sep 05 '18 at 06:41
  • Seconded on not attempting to do this with a direct regex. [HtmlAgilityPack](https://www.nuget.org/packages/HtmlAgilityPack/) will let you extract object representations of the elements and attributes you're looking for, that can be more safely queried in the manner you're attempting. – T2PS Sep 05 '18 at 06:43
  • 2
    Parsing HTML with regular expressions can have [unfortunate effects](https://stackoverflow.com/a/1732454/67392) on one's mental state: don't do it. – Richard Sep 05 '18 at 06:44
  • *. Even Jon Skeet or Chuck Norris cannot parse HTML using regular expressions* – TheGeneral Sep 05 '18 at 06:47
  • With html/xml file, why don't you use XML Validator - XSD (XML Schema) instead of regex? – Nhan Phan Sep 05 '18 at 06:52
  • Thank you all for your time I reach to the correct regex :) This is the regex that I need \[a-zA-Z0-9]+[^/])*\/"\s*target\="_blank"> – Muhammad Hassan Sep 05 '18 at 08:59

1 Answers1

0

Finally I reached to the regex that I need

\<a\s*href\="(?<value>[a-zA-Z0-9]+[^/])*\/"\s*target\="_blank">

this regex will select only the anchor that I need as in the question above

Muhammad Hassan
  • 475
  • 2
  • 14