Possible Duplicate:
RegEx match open tags except XHTML self-contained tags
I have a file containing about 2000 lines such as this:
<nobr> <a href="../Carbon_Monoxide_Poisoning_Prevention.htm"><b>poisoning - prevention</b></a></nobr><br>
<nobr> <a href="../Carbon_Monoxide_Symptoms.htm"><b>symptoms</b></a></nobr><br>
1.) the URL is ALWAYS in the form of ../foo.html
2.) the display name is SOMETIMES enclosed with <b> ... </b>
tags, and sometimes not.
3.) each line in the file contains up to four
that I need to count and flag as spaces. These will EVENTUALLY be used to format indents, so I need to capture the information somehow.
I need to have the hyperlink, display name and number of spaces name in a delimited flat file as follows (based on the above data):
../Carbon_Monoxide_Poisoning_Prevention.htm,poisoning - prevention,4
../Carbon_Monoxide_Symptoms.htm,symptoms,4
. While I can parse this through a whole mess of String, substring, and if statements, that seems to be more cumbersome than it needs to be. I was investigating Regex (my first time doing so), but am a little unclear on some of the syntax; I learn best seeing a code sample similar to my applications, but have not been able to find examples of anything that quite fits.
Any help would be appreciated!