1

I need to find the link tag using Regex. I have this line in my html file.

<link rel="stylesheet" href="<c:url value="/styles/folders/masterTree.css" />" type="text/css" media="screen, print" />

I need a regex expression to find this. This is not a homework. I need this as part of my office requirement.

Thanks to all in advance.

moinudin
  • 134,091
  • 45
  • 190
  • 216
Viidhya
  • 31
  • 1
  • 3
  • why do you need regex? and what other cases are you concerned about? is spacing an issue? are mismatched quotes an issue? how much of the link do you need to match? do you want the `href` value as well? – zzzzBov Jan 11 '11 at 22:15
  • possible duplicate of [RegEx match open tags except XHTML self-contained tags](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags) – Amber Jan 11 '11 at 22:16
  • Thanks for your response. I need to replace the above line with a String value. This is needed since I am converting the html to pdf.I dont need the value but my regex should just return this entire tag so I can do .replace on it. – Viidhya Jan 11 '11 at 22:18
  • Please reconsider using a regular expression. Look [here](http://stackoverflow.com/questions/238036/java-html-parsing) for several Java html parser recommendations. – moinudin Jan 11 '11 at 23:04

2 Answers2

2

Using regex to parse html can be problematic as most (x)html is not actual valid.

Because of all the edge cases you end up with it breaking before long.

You don't specify what language you are developing in but if you are working in .net I would suggest looking into using HtmlAgilityPack:

rtpHarry
  • 13,019
  • 4
  • 43
  • 64
  • This is how I solved it.html = html.replaceFirst("(]*styles/folders/masterTree.css[^>]*>)",MASTER_TREE_CSS); – Viidhya Jan 13 '11 at 11:39
1

You shouldn't. A real HTML parser is the only reliable way to parse HTML.

Chuck
  • 234,037
  • 30
  • 302
  • 389