Using regex
for parsing html is not recommended
regex
is used for regularly occurring patterns.html
is not regular with it's format(except xhtml
).For example html
files are valid even if you don't have a closing tag
!This could break your code.
Use an html parser like htmlagilitypack
WARNING {Don't try this in your code}
To solve your regex problem!
<.*>
replaces <
followed by 0 to many characters(i.e u>rag</u
) till last >
You should replace it with this regex
<.*?>
.*
is greedy i.e it would eat as many characters as it matches
.*?
is lazy i.e it would eat as less characters as possible