1

Don't yell at me!

i have seen many threads claiming that HTML cannot be properly parsed with REGEX.

i do not believe this is so. NB - i love REGEX and try to use it everywhere i can.

please convince we with any of the following:
1) HTML code that cannot be properly parsed with REGEX
2) an authority on the subject saying it is so
3) personal example of somewhere where using REGEX to parse HTML went horribly wrong for you.

thank you, and i hope this clears the subject up for me.

circusdei
  • 1,967
  • 12
  • 28
  • this is not actually a question at all. – Brian Driscoll Aug 16 '11 at 13:19
  • 2
    Have you *read* these "many threads?" There are all sorts of opinions bundled in [this one](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags). – Linus Kleen Aug 16 '11 at 13:20
  • http://stackoverflow.com/questions/701166/can-you-provide-some-examples-of-why-it-is-hard-to-parse-xml-and-html-with-a-rege – Alohci Aug 16 '11 at 13:28

1 Answers1

9

Can you find every paragraph in the following code using a regular expression?

<p  class =  "hello"    >
    Hello World
    <!-- I'm a comment, so <p>the tags</p> inside me must be ignored! -->

<P CLASS=hello>Hello World again!</p >

<p class="<p>">
    Hey, what about some CDATA? <![CDATA[ Let's put some <p> here too! ]]>
</p>

<p/>
    Good bye!
Arseni Mourzenko
  • 50,338
  • 35
  • 112
  • 199