22

I would like to check if a given string has a correct html syntax. I don't know which html elements should be inside, the only one thing I know is that string should be a correct html expression.

Anyone has an idea how to check it in C#?

ravenik
  • 852
  • 3
  • 9
  • 26
  • 2
    Also: http://stackoverflow.com/a/1732454/1583 – Oded Dec 15 '11 at 11:19
  • 4
    for your own good...please remove the regex tag :) – Marek Dec 15 '11 at 11:23
  • I tried exactly with this regex <([a-z]+) *[^/]*?> but it doesn't work properly all the time. It didn't find comments for example, I also thought about putting string into xml structure and than check but I'm not sure if it is the most efficient way... – ravenik Dec 15 '11 at 11:25
  • 1
    see if the below link can help you http://htmlagilitypack.codeplex.com/ – Pavan Dec 15 '11 at 11:21
  • 3
    @ravenik, HTML is **not a regular language**. Do not use Regex to parse HTML! –  Dec 15 '11 at 13:31
  • possible duplicate of [Using C#, how do I validate a html file?](http://stackoverflow.com/questions/3853882/using-c-how-do-i-validate-a-html-file) – Jeroen Aug 03 '15 at 06:40

1 Answers1

36

You can use Html Agility Pack : http://html-agility-pack.net/?z=codeplex

string html = "<span>Hello world</sspan>";

HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(html);

if (doc.ParseErrors.Count() > 0)
{
   //Invalid HTML
}
carla
  • 1,970
  • 1
  • 31
  • 44
Romain Meresse
  • 3,044
  • 25
  • 29