3

I have an XElement that I have to parse to remove the white space in the closing tag. My code looks like this:

var stringBuilder = new StringBuilder();
using (var stringWriter = new StringWriter(stringBuilder))
{
    xelement.Save(stringWriter);
}
stringBuilder.Replace(" />", "/>");
var xml = stringBuilder.ToString();

Basically, I'm making a stringbuilder and replacing the unneeded white space. The resulting string looks fine, except it has the XML declaration. I know that on an XmlWriter, I can omit the declaration with OmitXmlDeclaration but StringWriter doesn't have this.

Is there a way to do this, or do I need to manually parse out the declaration from the resulting string?

For clarity, here is the before and after XML:

// Before
<actionitem actiontaken="none" target="0" targetvariable="0">
  <windowname>Popup Window</windowname>
  <windowposx>-1</windowposx>
  <windowposy>-1</windowposy>
  <windowwidth>-1</windowwidth>
  <windowheight>-1</windowheight>
  <noscrollbars>false</noscrollbars>
  <nomenubar />
  <notoolbar />
  <noresize />
  <nostatus />
  <nolocation />
  <browserWnd />
</actionitem>

// After
<?xml version="1.0" encoding="utf-16"?>
<actionitem actiontaken="none" target="0" targetvariable="0">
  <windowname>Popup Window</windowname>
  <windowposx>-1</windowposx>
  <windowposy>-1</windowposy>
  <windowwidth>-1</windowwidth>
  <windowheight>-1</windowheight>
  <noscrollbars>false</noscrollbars>
  <nomenubar/>
  <notoolbar/>
  <noresize/>
  <nostatus/>
  <nolocation/>
  <browserWnd/>
</actionitem>

EDIT: For those that asked, this is for a Department of Defense project. Their specifications are locked in. That means, no white space in the closing tag, no matter how much I protest. Regardless of what's right or not, they don't want it, and they're signing the paycheck. I just try to accommodate them.

Kevin
  • 4,798
  • 19
  • 73
  • 120
  • Why do you need to remove the whitespace and the declaration? Are you use some sort of parser that doesn't accept valid XML? – D Stanley Feb 17 '14 at 19:44
  • 6
    That one byte saved per self-closing element is not worth the problems you are going to cause yourself by treating XML as a string. XML is not a string, it is not text, it is a transfer format. If you want to change this write a new XML writer. Really, though, http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 - stop treating XML as text, you *WILL* be sorry. – Jonathan Dickinson Feb 17 '14 at 19:47
  • I Agree with @JonathanDickinson. Saving those spaces is not worth the hassle or the risk. Also be sure to test this wit [CDATA etc. – H H Feb 17 '14 at 19:54
  • Side note - I think your post reads as: "I know that correct way is to create `XmlWriter` with my `StringWriter` but I want to use string manipulations to build XML..." – Alexei Levenkov Feb 17 '14 at 20:07
  • 2
    @JonathanDickinson I totally agree. However, I have no control over this. The government agency's specs are law, and we can't change them no matter how much we protest. So, we have to find ways to accommodate them. Otherwise, we're non-compliant and lose out. – Kevin Feb 17 '14 at 21:40
  • @AlexeiLevenkov My XML is already built. I just want to remove the whitespace. If it wasn't for that, I'd have no need to use string manipulation or stringbuilders. – Kevin Feb 17 '14 at 22:00
  • Why are you showing `xelement.Save` in your sample? - Consider to fix sample to avoid confusion... If you'd be able to call `Save` clearly you'd use `using(var writer = XmlWriter.Create(stringWriter, new XmlWriterSettings{OmitXmlDeclaration = true} )){xelement.Save(writer);}` instead of manually stripping the declaration... – Alexei Levenkov Feb 17 '14 at 22:23
  • @AlexeiLevenkov I had to save it as stringbuilder in order to .Replace the whitespace. Which worked fine, other than adding the declaration to the XML. I know that manipulating XML as a string isn't the right thing to do, but in this case I have to. Unless, of course, you can tell me how to have my generated XML not include whitespace in the closing tags, which is what led me to using the stringbuilder in the first place. – Kevin Feb 17 '14 at 22:34
  • You can try to play with custom `XmlWriter` and overriding `WriteEndElement`... but may be too hard - http://stackoverflow.com/questions/4600569/remove-the-space-of-ending-element ... – Alexei Levenkov Feb 17 '14 at 22:56
  • @KevinJ that makes a whole lot more sense. I'll write up an answer shortly. – Jonathan Dickinson Feb 18 '14 at 07:38
  • 3
    No wonder DOD always overspend. How many man hours have been poured into this useless requirement?! Too many. – Gusdor Feb 18 '14 at 08:06
  • @Gusdor I totally agree. If it wasn't for this small 'requirement' I'd already be done with this piece. – Kevin Feb 19 '14 at 11:47
  • @KevinJ You are in my prayers. Godspeed brave sir! – Gusdor Feb 19 '14 at 11:54
  • @Gusdor Ha ha! Thanks for the prayers. I need all the help I can get. – Kevin Feb 19 '14 at 11:58

2 Answers2

3

Use ToString() instead of Save(). That eliminates the need for the StringBuilder too.

 string xml = xelement.ToString(); // no declaration element added
 xml = xml.Replace(" />", "/>");   // if you really think you must
H H
  • 263,252
  • 30
  • 330
  • 514
  • Henk, I'll try this out as soon as I can. I'm not at work now and don't have access to the program. But, thanks so much for the help! I greatly appreciate it. – Kevin Feb 17 '14 at 22:36
  • Thanks, Henk! This worked, and is much simpler than me trying to use a string builder. – Kevin Feb 19 '14 at 12:13
2

Disclaimer: this answer applies to Kevin J and Kevin J alone. Do not perform string manipulation on XML.

If you still want to use the StringBuilder/StringWriter you can wire the StringWriter (which is a TextWriter) through a XmlWriter:

var xe = new XElement("test",
    new XElement("child1"),
    new XElement("child2"));

var sb = new StringBuilder();
using (var writer = new StringWriter(sb))
using (var xr = XmlWriter.Create(writer, new XmlWriterSettings()
{
    OmitXmlDeclaration = true
}))
{
    xe.Save(xr);
}

sb.Replace(" />", "/>");

As for the Replace I wrote a function that should be a bit more resilient against corner cases (comments, CData). It should also use less cycles and consume less memory.

static void StripClosingWhitespace(StringBuilder sb)
{
    var inComment = false;
    var inCData = false;

    for (var i = 0; i < sb.Length; i++)
    {
        var c = sb[i];
        if (inComment)
        {
            if (c == '>' && sb[i - 1] == '-' && sb[i - 2] == '-')
                inComment = false;
        }
        else if (inCData)
        {
            if (c == '>' && sb[i - 1] == ']' && sb[i - 2] == ']')
                inCData = false;
        }
        else if (i > 2 && c == '-' && sb[i - 1] == '-' && sb[i - 2] == '!' && sb[i - 3] == '<')
        {
            inComment = true;
        }
        else if (i > 7 && 
            c == '[' && 
            sb[i - 1] == 'A' && sb[i - 2] == 'T' && sb[i - 3] == 'A' && sb[i - 4] == 'D' && sb[i - 5] == 'C' &&
            sb[i - 6] == '[' && sb[i - 7] == '!' && sb[i - 8] == '<')
        {
            inCData = true;
        }
        else if (i > 2 && c == '>' && sb[i - 1] == '/' && char.IsWhiteSpace(sb[i - 2]))
        {
            sb.Remove(i - 2, 1);
            i--;
        }
        else
        {
            // Do nothing
        }
    }
}
Jonathan Dickinson
  • 9,050
  • 1
  • 37
  • 60
  • Johnathan, I do appreciate you helping me out by writing this routine. However, in this case, Henk's answer won out. Since the XML being built by the program will never have comments or CDATA (this XML is used to build SCORM-compliant lesson files), I couldn't use your routine. I do greatly appreciate your input and help, though. Thanks! – Kevin Feb 19 '14 at 12:11