3

I was wondering if there is a library in .Net to clean up and remove unclosed tags in an html document?

skaffman
  • 398,947
  • 96
  • 818
  • 769
ryudice
  • 36,476
  • 32
  • 115
  • 163

2 Answers2

3

html agility pack

http://www.codeplex.com/htmlagilitypack

Luke Schafer
  • 9,209
  • 2
  • 28
  • 29
  • Sorry to bother you again, I've tried to use Html Agility Pack but was not successful, what I did is to create a new HtmlDocument passing the string containing the html I want to fix in the constructor, however, I need to return the document as string which I dont know how to do it – ryudice Dec 02 '09 at 03:03
  • I parsed my text using the HtmlDocument class but it still leaves unclosed tags there, is there a way to remove them? – ryudice Dec 02 '09 at 03:13
  • Off the top of my head I can't remember, but try outputasxml, or there's another option on there to fix nested tags but I'm not sure under what circumstances it works. – Luke Schafer Dec 02 '09 at 04:34
  • Luke, I believe your referring to the answer I just gave to my own question. http://stackoverflow.com/questions/2175071/how-would-i-get-the-inputs-from-a-certain-form-with-htmlagility-pack-lang-c-ne – codygman Feb 04 '10 at 15:05
  • I wasn't, I've used it before, but that's a great post and thanks for sharing – Luke Schafer Feb 04 '10 at 23:26
2

HtmlTidy!

See the url below for more details:

http://www.devx.com/dotnet/Article/20505/0/page/2

The source of the download/project is:

http://tidy.sourceforge.net/

I gave the other link because it contains information about a .net wrapper and setting everything up. Hope this helps!

codygman
  • 832
  • 1
  • 13
  • 30
  • 1
    For C# the specific link is a project maintained by Mark Beaton, called TidyManaged https://github.com/markbeaton/TidyManaged – wonea Jan 10 '11 at 17:33