4

My first question here, so bear with me. Basically, my problem is this: Im building an XML IDE for an internal language. A feature of it should be to auto-indent the XML by using some command. Similar to what is found in Visual Studio etc.

Basically what I need is to turn the following Xml:

<?xml version="1.0" encoding="UTF-8"?>
<note>
        <to>Tove</to>

    <from>Jani</from>
        <heading>Reminder</heading>
    <body>Don't forget me this weekend!</body>

</note>

Into:

<?xml version="1.0" encoding="UTF-8"?>
<note>
    <to>Tove</to>

    <from>Jani</from>
    <heading>Reminder</heading>
    <body>Don't forget me this weekend!</body>

</note>

That is indent - but touch nothing else. Is this possible in C# without writing an algorithm from scratch, i.e., with LINQ XDocument or some XmlWriter implementaion?

I've tried the following so far (from What is the simplest way to get indented XML with line breaks from XmlDocument?)

static public string Beautify(this XmlDocument doc)
{
    StringBuilder sb = new StringBuilder();
    XmlWriterSettings settings = new XmlWriterSettings
    {
        Indent = true,
        IndentChars = "  ",
        NewLineChars = "\r\n",
        NewLineHandling = NewLineHandling.Replace
    };
    using (XmlWriter writer = XmlWriter.Create(sb, settings)) {
        doc.Save(writer);
    }
    return sb.ToString(); 
}

But this removes linebreaks and gives me:

<?xml version="1.0" encoding="UTF-8"?>
<note>
    <to>Tove</to>
    <from>Jani</from>
    <heading>Reminder</heading>
    <body>Don't forget me this weekend!</body>
</note>

Thanks in advance to anyone with comments or answers.

Community
  • 1
  • 1
rzr
  • 513
  • 4
  • 6

3 Answers3

4

I'd try replacing all newlines with a custom tag (e.g. <newline></newline>, running the result through your existing Beautify code and then replacing the newline tags with proper newlines again.

UPDATE: Thinking about this more you might need to replace \n\n with '\n', but you get the general idea.

Mark Feldman
  • 15,731
  • 3
  • 31
  • 58
  • Thanks for the suggestion. However, it would be really hard to handle not placing inside tags and thus producing invalid xml. – rzr Jan 13 '16 at 10:01
  • Going to mark this as the answer since the suggestion ultimately lead me to a solution. The solution resembles the one given by Larry Smithmier in another answer, and involves 2+ passes over the xml. – rzr Feb 05 '16 at 12:46
2

Building on Mark's good suggestion, this will beautify the XML string (but the code isn't very pretty):

class Program
{
    static void Main(string[] args)
    {
        string test = @"<?xml version=""1.0"" encoding=""UTF-8""?>
<note>
    <to>Tove</to>

<from>Jani</from>
    <heading>Reminder</heading>
<body>Don't forget me this weekend!</body>

</note>";
        string output = Test.BeautifyXML(test);
        Console.Write(output);
        Console.ReadLine();
    }
}
static class Test { 
    static public string BeautifyXML(this string docString)
    {
        docString = Regex.Replace(docString.Replace("\r", "<r></r>").Replace("\n", "<n></n>"),@"\?>(<r></r><n></n>)*", "?>");
        XmlDocument doc = new XmlDocument();
        doc.LoadXml(docString);
        StringBuilder sb = new StringBuilder();
        XmlWriterSettings settings = new XmlWriterSettings
        {
            Indent = true,
            IndentChars = "  ",
            NewLineChars = "\r\n",
            NewLineHandling = NewLineHandling.Replace
        };
        using (XmlWriter writer = XmlWriter.Create(sb, settings))
        {
            doc.Save(writer);
        }
        return Regex.Replace(sb.ToString().Replace("\r\n", ""), @"<r></r>( )*<n></n>", "\r\n").Replace("?>", "?>\r\n");
    }
}

Output:

<?xml version="1.0" encoding="utf-16"?>
<note>
  <to>Tove</to>

  <from>Jani</from>
  <heading>Reminder</heading>
  <body>Don't forget me this weekend!</body>

</note>
Larry Smithmier
  • 2,711
  • 2
  • 23
  • 30
0

This might do the trick for you

instead of

NewLineHandling = NewLineHandling.Replace

use

NewLineHandling = NewLineHandling.None

The None setting tells the XmlWriter to leave the input unchanged. This setting is used when you not want any new-line processing.

Mohit S
  • 13,723
  • 6
  • 34
  • 69