2

Is there a tool/library/function in C# which tabifies or indents generated html code without validating or tidying the input?

Edit:

Indent generated HTML code from JavaScript TextEditors, including but not limited to TinyMCE. No HtmlTextWriter. Must not expect a valid XML/XHTML/HTML code.

Requirement:

  • Add a new line before and after opening and closing tags.
  • Indent content inside tags (Tab or 4 Spaces).
  • Split a long line (having N number of words) into multiple indented lines.
  • Do not change the input even though it is not a valid HTML. Only tabify/indent and split long lines.

Upto this point, I have:

private string FormatHtml(string input)
{
    //Opening tags
    Regex r = new Regex("<([a-z]+) *[^/]*?>");
    string retVal = string.Empty;
    retVal = r.Replace(input, string.Format("$&{0}\t", Environment.NewLine));

    //Closing tags
    r = new Regex("</[^>]*>");
    retVal = r.Replace(retVal, string.Format("{0}$&{0}", Environment.NewLine));

    //Self closing tags
    r = new Regex("<[^>/]*/>");
    retVal = r.Replace(retVal, string.Format("$&{0}", Environment.NewLine));

    return retVal;
}
Nick Binnet
  • 1,910
  • 7
  • 33
  • 49
  • 2
    [HtmlTextWriter](http://msdn.microsoft.com/en-us/library/system.web.ui.htmltextwriter.aspx) – jrummell Feb 10 '12 at 16:08
  • Can you provide a sample of your input, and a sample of your expected output? – Steven Schroeder Feb 10 '12 at 16:11
  • http://www.manoli.net/csharpformat/ web-based, but provides C# source that you could integrate into another application. – 3Dave Feb 10 '12 at 16:20
  • That does not indent code at all. :) – Nick Binnet Feb 10 '12 at 16:22
  • _If_ the input HTML is also valid XML, you can use the XmlWriter with appropriate [XmlWriterSettings](http://msdn.microsoft.com/en-us/library/system.xml.xmlwritersettings.aspx). – Joshua Honig Feb 10 '12 at 16:30
  • If the HTML input is not valid, you'll have hard time to reindent it, as browsers will render invalid HTML the way they want. Whitespaces can be significant, especially in the case of malformed HTML. Anyway, you can use the HTML Agility Pack library at least to parse it. See here on SO for more on this: http://stackoverflow.com/questions/846994/how-to-use-html-agility-pack – Simon Mourier Feb 17 '12 at 17:29

2 Answers2

3

You might want to rethink your approach, inserting newlines (and indenting) can cause serious white-space problems.

<span style="color:red">test</span><span>ing</span>

The html above does not display the same as the html you want to convert it to, there will be extra whitespace in the rendered html: testing vs test ing

<span style="color:red">
    test
</span>
<span>
    ing
</span>

You should only insert a newline if there is already whitespace present.

Willem
  • 5,364
  • 2
  • 23
  • 44
2

This may be a bit of a long winded way of doing it but its the only thing I can think of off the top of my head.

Use an sgml converter to convert the html to xml ie HtmlAgility or SgmlReader

You could then write out to an XmlTextWriter and specify in the settings that you want indents.

Fen
  • 933
  • 5
  • 13
  • Instead of making any changes to the input, how could one only indent the contents inside html tags? – Nick Binnet Feb 10 '12 at 16:36
  • In that case, have you looked at HTML Tidy, http://tidy.sourceforge.net/ there is a .net binding available at http://markbeaton.com/SoftwareInfo.aspx?ID=81a0ecd0-c41c-48da-8a39-f10c8aa3f931 – Fen Feb 10 '12 at 16:42
  • The referenced library "libtidy.dll" crashes Windows Server 2008 R2 SP1 64 bit. Very unstable. :( – Nick Binnet Feb 10 '12 at 17:32
  • Still, when looking at the above link, it does not indent code, as well. – Nick Binnet Feb 10 '12 at 17:34
  • The documentation of Tidy html shows that it can indent code, http://tidy.sourceforge.net/docs/Overview.html so not sure what is going on there, as to the crash, the most likely cause is that the source is compiled against any source which I have had issues with in the past. Either set it to x86 or x64. – Fen Feb 13 '12 at 10:00