1

I have a large StringBuilder (~140MB) that I need to write in an XML attribute. I'm using XDocument to handle XML manipulation.

When trying to write the string to the XAttribute, I get a System.OutOfMemoryException (because I need to call StringBuilder.ToString(), which I suppose loads the whole string in memory).

var length = value.RawArtifact.Content.Length;
StringBuilder b = new StringBuilder();
int pos = 0;
while (pos < length - 1000)
{
    b.Append(BitConverter.ToString(value.RawArtifact.Content, pos, 1000).Replace("-", ""));
    pos += 1000;
}
b.Append(BitConverter.ToString(value.RawArtifact.Content, pos)).Replace("-", "");
var buffer = b.ToString(); // This throws an exception
myAttribute.SetAttributeValue("my-attribute", buffer);

I could not find any overload for SetAttributeValue that would take something like a StreamReader or anything, so I feel kind of stuck right now.

Any suggestions ?

dbc
  • 104,963
  • 20
  • 228
  • 340
Arthur Attout
  • 2,701
  • 2
  • 26
  • 49
  • 2
    Not answering your question, but writing a string of 140MB into a single attribute sounds like a bad idea. Even having an XML of this size doesn't sound good to me. – Philippe Aug 26 '21 at 12:58
  • It is a terrible idea. I know .. But unfortunately this decision was not up to me .. In the end, however, the final result is still *theoretically* XML-compliant, so aside from the abhorrent size it represents, I'm not doing something that unusual. – Arthur Attout Aug 26 '21 at 13:00
  • If you check the [reference source for `XAttribute`](https://github.com/dotnet/runtime/blob/main/src/libraries/System.Private.Xml.Linq/src/System/Xml/Linq/XAttribute.cs#L36) you will see that `XAttribute` has `internal string value;` so there is no way to use a `StringBuilder` or `StreamReader` as a value. – dbc Aug 26 '21 at 15:23
  • 1
    This doesn't precisely answer your question, but if you were to write your XML with `XmlWriter` you could write the attribute value in chunks using `XmlWriter.WriteChars()`. See https://dotnetfiddle.net/InBShg for a demo. – dbc Aug 26 '21 at 15:25
  • Thanks, that's actually a nifty idea ! I'll have to be creative because I'm injecting my do-everything-related-to-XML class, but unless someone is quicker, I'll post what I come up with as an answer. – Arthur Attout Aug 26 '21 at 19:22
  • If you are modifying an existing XML file, you might consider a streaming transformation along the lines of [Combining the XmlReader and XmlWriter classes for simple streaming transformations](https://web.archive.org/web/20160214201608/https://blogs.msdn.com/b/mfussell/archive/2005/02/12/371546.aspx) and answers that use it including [File size restriction or limitation in C#](https://stackoverflow.com/a/12500278/3744182), [Edit a large XML file](https://stackoverflow.com/a/48575462/3744182) and [Automating replacing tables from external files](https://stackoverflow.com/a/28903486/3744182). – dbc Aug 26 '21 at 21:39
  • At one point I wrote an answer suggesting use of `XmlWriter.SwiteChars()` -- then realized you were using `XDocument` not `XmlWriter` so I deleted the answer. Should I undelete it? – dbc Aug 26 '21 at 21:42
  • I'd say yes. What I'll do is instead of directly writing those huge string through `SetAttributeValue`, I'll write them at the last moment by using a custom implementation of `XmlWriter`, and injecting that writer in `XDocument.Save(XmlWriter writer)`. But this will probably be largely inspired by your proposition. – Arthur Attout Aug 27 '21 at 08:21

1 Answers1

1

If you check the reference source for XAttribute you will see that XAttribute has internal string value; so there is no way to use a StringBuilder or StreamReader as a value.

Instead, you might consider a streaming approach where you inject the required attribute into the XML stream as you write out your XDocument. If you do, you can combine XmlWriter.WriteStartAttribute() with XmlWriter.WriteChars() to write your huge attribute value in chunks. The WriteChars() method:

can be used to write large amounts of text one buffer at a time.

And so is designed for exactly this situation. There are two basic ways you might implement streaming injection of an attribute value:

  1. Use the algorithm from Combining the XmlReader and XmlWriter classes for simple streaming transformations by Mark Fussell and inject the attribute while streaming from an XmlReader returned by XDocument.CreateReader() to an XmlWriter.

    For some examples, see File size restriction or limitation in C#, Edit a large XML file and Automating replacing tables from external files.

  2. Subclass XmlWriter itself and inject the attribute as your target element is being written.

    For an example see Custom xmlWriter to skip a certain element?.

Adopting the second approach, first create the following extension methods:

public static partial class XmlExtensions
{
    public static void WriteAttribute(this XmlWriter writer, string localName, IEnumerable<(char [] Buffer, int Length)> valueSegments) =>
        WriteAttribute(writer, null, localName, null, valueSegments);
        
    public static void WriteAttribute(this XmlWriter writer, string localName, string namespaceUri, IEnumerable<(char [] Buffer, int Length)> valueSegments) =>
        WriteAttribute(writer, null, localName, namespaceUri, valueSegments);
    
    public static void WriteAttribute(this XmlWriter writer, string prefix, string localName, string namespaceUri, IEnumerable<(char [] Buffer, int Length)> valueSegments)
    {
        writer.WriteStartAttribute(prefix, localName, namespaceUri);
        char [] surrogateBuffer = null;

        // According to the docs, surrogate pairs cannot be split across calls to WriteChars():
        // https://learn.microsoft.com/en-us/dotnet/api/system.xml.xmlwriter.writechars?view=net-5.0#remarks
        // So if the last character of a segment is a high surrogate, buffer it and write it with the first character of the next buffer.
        foreach (var segment in valueSegments)
        {
            if (segment.Length < 1)
                continue;
            int start = 0;
            if (surrogateBuffer != null && surrogateBuffer[0] != '\0')
            {
                surrogateBuffer[1] = segment.Buffer[start++];
                writer.WriteChars(surrogateBuffer, 0, 2);
                surrogateBuffer[0] = surrogateBuffer[1] = '\0';
            }
            int count = segment.Length - start;
            if (count > 0 && char.IsHighSurrogate(segment.Buffer[segment.Length-1]))
            {
                (surrogateBuffer = surrogateBuffer ?? new char[2])[0] = segment.Buffer[segment.Length-1];
                count--;
            }
            writer.WriteChars(segment.Buffer, start, count);
        }
        writer.WriteEndAttribute();
        if (surrogateBuffer != null && surrogateBuffer[0] != '\0')
            throw new XmlException(string.Format("Unterminated surrogate pair {0}", surrogateBuffer[0]));
    }
}

public static class ByteExtensions
{
    // Copied from this answer https://stackoverflow.com/a/14333437
    // By https://stackoverflow.com/users/445517/codesinchaos
    // To https://stackoverflow.com/questions/311165/how-do-you-convert-a-byte-array-to-a-hexadecimal-string-and-vice-versa
    // And modified to populate a char span rather than return a string.
    public static void ByteToHexBitFiddle(ReadOnlySpan<byte> bytes, Span<char> c)
    {
        if (c.Length < 2* bytes.Length)
            throw new ArgumentException("c.Length < 2* bytes.Length");
        int b;
        for (int i = 0; i < bytes.Length; i++) {
            b = bytes[i] >> 4;
            c[i * 2] = (char)(55 + b + (((b-10)>>31)&-7));
            b = bytes[i] & 0xF;
            c[i * 2 + 1] = (char)(55 + b + (((b-10)>>31)&-7));
        }
    }
    
    public static IEnumerable<(char [] segment, int length)> GetHexCharSegments(ReadOnlyMemory<byte> bytes, int chunkSize = 1000)
    {
        var buffer = new char[2*chunkSize];
        var length = bytes.Length;
        int pos = 0;
        while (pos < length - chunkSize)
        {
            ByteExtensions.ByteToHexBitFiddle(bytes.Span.Slice(pos, chunkSize), buffer);
            yield return (buffer, buffer.Length);
            pos += chunkSize;
        }
        ByteExtensions.ByteToHexBitFiddle(bytes.Span.Slice(pos), buffer);
        yield return (buffer, 2*(length - pos));
    }
}

Next, subclass XmlWriter as follows:

public class ElementEventArgs : EventArgs
{
    public XName Element { get; init; }
    public Stack<XName> ElementStack { get; init; }
}

public class NotifyingXmlWriter : XmlWriterProxy
{
    readonly Stack<XName> elements = new Stack<XName>();

    public NotifyingXmlWriter(XmlWriter baseWriter) : base(baseWriter) { }

    public event EventHandler<ElementEventArgs> OnElementStarted;
    public event EventHandler<ElementEventArgs> OnElementEnded;

    public override void WriteStartElement(string prefix, string localName, string ns)
    {
        base.WriteStartElement(prefix, localName, ns);
        var name = XName.Get(localName, ns);
        elements.Push(name);
        OnElementStarted?.Invoke(this, new ElementEventArgs { Element = name, ElementStack = elements });
    }

    public override void WriteEndElement()
    {
        base.WriteEndElement();
        var name = elements.Pop(); // Pop after base.WriteEndElement() lets the base class throw an exception on a stack error.
        OnElementEnded?.Invoke(this, new ElementEventArgs { Element = name, ElementStack = elements });
    }
}

public class XmlWriterProxy : XmlWriter
{
    // Taken from this answer https://stackoverflow.com/a/32150990/3744182
    // by https://stackoverflow.com/users/3744182/dbc
    // To https://stackoverflow.com/questions/32149676/custom-xmlwriter-to-skip-a-certain-element
    // NOTE: async methods not implemented
    readonly XmlWriter baseWriter;

    public XmlWriterProxy(XmlWriter baseWriter) => this.baseWriter = baseWriter ?? throw new ArgumentNullException();

    protected virtual bool IsSuspended { get { return false; } }

    public override void Close() => baseWriter.Close();

    public override void Flush() => baseWriter.Flush();

    public override string LookupPrefix(string ns) => baseWriter.LookupPrefix(ns);

    public override void WriteBase64(byte[] buffer, int index, int count)
    {
        if (IsSuspended)
            return;
        baseWriter.WriteBase64(buffer, index, count);
    }

    public override void WriteCData(string text)
    {
        if (IsSuspended)
            return;
        baseWriter.WriteCData(text);
    }

    public override void WriteCharEntity(char ch)
    {
        if (IsSuspended)
            return;
        baseWriter.WriteCharEntity(ch);
    }

    public override void WriteChars(char[] buffer, int index, int count)
    {
        if (IsSuspended)
            return;
        baseWriter.WriteChars(buffer, index, count);
    }

    public override void WriteComment(string text)
    {
        if (IsSuspended)
            return;
        baseWriter.WriteComment(text);
    }

    public override void WriteDocType(string name, string pubid, string sysid, string subset)
    {
        if (IsSuspended)
            return;
        baseWriter.WriteDocType(name, pubid, sysid, subset);
    }

    public override void WriteEndAttribute()
    {
        if (IsSuspended)
            return;
        baseWriter.WriteEndAttribute();
    }

    public override void WriteEndDocument()
    {
        if (IsSuspended)
            return;
        baseWriter.WriteEndDocument();
    }

    public override void WriteEndElement()
    {
        if (IsSuspended)
            return;
        baseWriter.WriteEndElement();
    }

    public override void WriteEntityRef(string name)
    {
        if (IsSuspended)
            return;
        baseWriter.WriteEntityRef(name);
    }

    public override void WriteFullEndElement()
    {
        if (IsSuspended)
            return;
        baseWriter.WriteFullEndElement();
    }

    public override void WriteProcessingInstruction(string name, string text)
    {
        if (IsSuspended)
            return;
        baseWriter.WriteProcessingInstruction(name, text);
    }

    public override void WriteRaw(string data)
    {
        if (IsSuspended)
            return;
        baseWriter.WriteRaw(data);
    }

    public override void WriteRaw(char[] buffer, int index, int count)
    {
        if (IsSuspended)
            return;
        baseWriter.WriteRaw(buffer, index, count);
    }

    public override void WriteStartAttribute(string prefix, string localName, string ns)
    {
        if (IsSuspended)
            return;
        baseWriter.WriteStartAttribute(prefix, localName, ns);
    }

    public override void WriteStartDocument(bool standalone) => baseWriter.WriteStartDocument(standalone);

    public override void WriteStartDocument() => baseWriter.WriteStartDocument();

    public override void WriteStartElement(string prefix, string localName, string ns)
    {
        if (IsSuspended)
            return;
        baseWriter.WriteStartElement(prefix, localName, ns);
    }

    public override WriteState WriteState => baseWriter.WriteState;

    public override void WriteString(string text)
    {
        if (IsSuspended)
            return;
        baseWriter.WriteString(text);
    }

    public override void WriteSurrogateCharEntity(char lowChar, char highChar)
    {
        if (IsSuspended)
            return;
        baseWriter.WriteSurrogateCharEntity(lowChar, highChar);
    }

    public override void WriteWhitespace(string ws)
    {
        if (IsSuspended)
            return;
        baseWriter.WriteWhitespace(ws);
    }
}   

And now you will be able to do something like:

string fileName = @"Question68941254.xml"; // or whatever

XNamespace targetNamespace = "";
XName targetName = targetNamespace + "TheNode";

using (var textWriter = new StreamWriter(fileName))
using (var innerXmlWriter = XmlWriter.Create(textWriter, new XmlWriterSettings { Indent = true }))
using (var xmlWriter = new NotifyingXmlWriter(innerXmlWriter))
{
    xmlWriter.OnElementStarted += (o, e) =>
    {
        if (e.Element == targetName)
        {
            // Add the attribute with the byte hex value to the target element.
            ((XmlWriter)o).WriteAttribute("TheAttribute", ByteExtensions.GetHexCharSegments(value.RawArtifact.Content.AsMemory()));
        }
    };
    xdocument.WriteTo(xmlWriter);
}

Where xdocument is, of course, some XDocument that you are trying to populate and add an attribute TheAttribute to a node TheNode.

Notes:

  • Since your code shows you are populating the StringBuilder by converting a large byte array to a large hex string buffer, I eliminated the intermediate StringBuilder and wrote the byte array directly in chunks.

    If you do need to write the contents of some StringBuilder b in chunks, use

    public static partial class StringBuilderExtensions
    {
        public static IEnumerable<(char [] segment, int length)> GetSegments(this StringBuilder sb, int bufferSize = 1024)
        {
            var buffer = new char[bufferSize];
            for (int i = 0; i < sb.Length; i += buffer.Length)
            {
                int length = Math.Min(buffer.Length, sb.Length - i);
                sb.CopyTo(i, buffer, length);
                yield return (buffer, length);
            }
        }
    }
    

    And pass b.GetSegments() to XmlExtensions.WriteAttribute().

Demo fiddle here with the result:

<?xml version="1.0" encoding="utf-8"?>
<Root>
  <SomeOtherNode>some value</SomeOtherNode>
  <TheNode TheAttribute="000102030405060708090A0B0C0D0E0F101112131415161718191A1B1C1D1E1F202122232425262728292A2B2C2D2E2F303132333435363738393A3B3C3D3E3F404142434445464748494A4B4C4D4E4F505152535455565758595A5B5C5D5E5F606162636465666768696A6B6C6D6E6F707172737475767778797A7B7C7D7E7F808182838485868788898A8B8C8D8E8F909192939495969798999A9B9C9D9E9FA0A1A2A3A4A5A6A7A8A9AAABACADAEAFB0B1B2B3B4B5B6B7B8B9BABBBCBDBEBFC0C1C2C3C4C5C6C7C8C9CACBCCCDCECFD0D1D2D3D4D5D6D7D8D9DADBDCDDDEDFE0E1E2E3E4E5E6E7E8E9EAEBECEDEEEFF0F1F2F3F4F5F6F7F8F9FAFBFCFDFEFF000102030405060708090A0B0C0D0E0F101112131415161718191A1B1C1D1E1F202122232425262728292A2B">
    <foo></foo>the node value</TheNode>
  <AnotherNode>another value</AnotherNode>
</Root>
dbc
  • 104,963
  • 20
  • 228
  • 340