15

So, I am generating html using HtmlAgilityPack and it's working perfectly, but html text is not indented. I can get indented XML however, but I need HTML. Is there a way?

HtmlDocument doc = new HtmlDocument();

// gen html
HtmlNode table = doc.CreateElement("table");
table.Attributes.Add("class", "tableClass");
HtmlNode tr = doc.CreateElement("tr");
table.ChildNodes.Append(tr);
HtmlNode td = doc.CreateElement("td");
td.InnerHtml = "—";
tr.ChildNodes.Append(td);

// write text, no indent :(
using(StreamWriter sw = new StreamWriter("table.html"))
{
        table.WriteTo(sw);
}

// write xml, nicely indented but it's XML!
XmlWriterSettings settings = new XmlWriterSettings();
settings.OmitXmlDeclaration = true;
settings.Indent = true;
settings.ConformanceLevel = ConformanceLevel.Fragment;
using (XmlWriter xw = XmlTextWriter.Create("table.xml", settings))
{
        table.WriteTo(xw);
}
Petr Abdulin
  • 33,883
  • 9
  • 62
  • 96

4 Answers4

8

Fast, Reliable, Pure C#, .NET Core compatible AngleSharp

You can parse it with AngleSharp which provides a way to auto indent:

var parser = new HtmlParser();
var document = parser.ParseDocument(text);
using (var writer = new StringWriter())
{
    document.ToHtml(writer, new PrettyMarkupFormatter
                            {
                                Indentation = "\t",
                                NewLine = "\n"
                            });
    var indentedText = writer.ToString();
}
Fab
  • 14,327
  • 5
  • 49
  • 68
  • Just FYI, Requires use of Async code and/or some type of wrapper for Async. @Fab can you point to an example of how to call this library from non-Async C#? – qxotk Apr 28 '20 at 22:13
  • @qxotk sorry but I cannot see where is "async" code in the above. Do you mean AngleSharp is parsing or rendering the html in a asynchronous way (like delegating work to another thread) ? – Fab Apr 29 '20 at 06:32
  • Yes, AngleSharp uses async - and that propegates back to client code. I tried to use the library and did not have time to learn how to use the library in a way that could insulate my code from async by using tasks. Do you know how to call the AngleSharp code without using async in the client code? – qxotk May 07 '20 at 11:20
  • You can completely replace Html Agility Pack with AngleSharp and create the whole document in it. – pistipanko Nov 02 '20 at 12:55
  • Worked for me in 2021. Thanks! – Victor Zakharov Sep 13 '21 at 20:11
6

No, and it's a "by design" choice. There is a big difference between XML (or XHTML, which is XML, not HTML) where - most of the times - whitespaces are no specific meaning, and HTML.

This is not a so minor improvement, as changing whitespaces can change the way some browsers render a given HTML chunk, especially malformed HTML (that is in general well handled by the library). And the Html Agility Pack was designed to keep the way the HTML is rendered, not to minimize the way the markup is written.

I'm not saying it's not feasible or plain impossible. Obviously you can convert to XML and voilà (and you could write an extension method to make this easier) but the rendered output may be different, in the general case.

Simon Mourier
  • 132,049
  • 21
  • 248
  • 298
  • 1
    Well, obviously I'm not and expert in HTML, but convertion to XML doesn't work the way I wish. I was using linq to XML to generate html, but switched to HAP because of this subtle differences of XML and HTML. E.g. i can't output — HTML entity it's transformed into — . Anyway, thanks for the info! – Petr Abdulin May 11 '11 at 19:12
  • `most of the times - whitespaces are no specific meaning` is only correct for machines, not human, we do need whitespace to visualize and understand what is written. – Akash Kava Jan 02 '16 at 07:59
4

As far as I know, HtmlAgilityPack cannot do this. But you could look through html tidy packs which are proposed in similar questions:

Community
  • 1
  • 1
Oleks
  • 31,955
  • 11
  • 77
  • 132
  • Looks like it's true, need another lib to solve such simple task. HtmlAgilityPack surely needs this minor improvement. – Petr Abdulin May 10 '11 at 11:29
  • So, there's no way to make it not spit out minimized HTML? (i.e. the HTML that comes out of agility pack is not readable, it's all condensed on one line, there's no way to insert line breaks or tabs just to make it readable?) – BrainSlugs83 Jul 08 '15 at 23:09
1

I made the same experience even though HtmlAgilityPack is great to read and modify Html (or in my case asp) files you cannot create readable output.

However, I ended up in writing some lines of code which work for me:

Having a HtmlDocument named "m_htmlDocument" I create my HTML file as follows:

file = new System.IO.StreamWriter(_sFullPath);
            if (m_htmlDocument.DocumentNode != null)
                foreach (var node in m_htmlDocument.DocumentNode.ChildNodes)
                    WriteNode(file, node, 0);

and

void WriteNode(System.IO.StreamWriter _file, HtmlNode _node, int _indentLevel)
    {
        // check parameter
        if (_file == null) return;
        if (_node == null) return;

        // init 
        string INDENT = " ";
        string NEW_LINE = System.Environment.NewLine;

        // case: no children
        if(_node.HasChildNodes == false)
        {
            for (int i = 0; i < _indentLevel; i++)
                _file.Write(INDENT);
            _file.Write(_node.OuterHtml);
            _file.Write(NEW_LINE);
        }

        // case: node has childs
        else
        {
            // indent
            for (int i = 0; i < _indentLevel; i++)
                _file.Write(INDENT);

            // open tag
            _file.Write(string.Format("<{0} ",_node.Name));
            if(_node.HasAttributes)
                foreach(var attr in _node.Attributes)
                    _file.Write(string.Format("{0}=\"{1}\" ", attr.Name, attr.Value));
            _file.Write(string.Format(">{0}",NEW_LINE));

            // childs
            foreach(var chldNode in _node.ChildNodes)
                WriteNode(_file, chldNode, _indentLevel + 1);

            // close tag
            for (int i = 0; i < _indentLevel; i++)
                _file.Write(INDENT);
            _file.Write(string.Format("</{0}>{1}", _node.Name,NEW_LINE));
        }
    }
Chris
  • 49
  • 2