File size restriction or limitation in C#

Question

I want to generate XML file from one object (Contains nested collection) with large amount of data. but there is a limitation with XML that it can't exceed 50MB.

Are there any good way to do this?

Update : speed is not important, the main thing is split into 50MB for each file

What are you going to do with the rest of it? What would your output files look like? — Samuel Edwin Ward, Sep 19 '12 at 23:58
What are with having problems with as far as splitting it up into 50mb files? — Austin Henley, Sep 20 '12 at 17:44
pls refer https://stackoverflow.com/questions/25222674/streamwriter-limit-in-c-sharp-in-text-file — LakshmiSarada, Oct 09 '18 at 20:03

johnny g · Answer 1 · 2012-12-05T17:02:13.767

Ran into a similar requirement in my work. My best effort (intuitive, ease of implementation, relatively performant) is as follows. I basically write with an XmlWriter, monitoring the underlying stream. When it surpasses my file size limit, I complete the current Xml fragment, save file, close stream.

Then on a second pass, I load the full DOM into memory, and iteratively remove nodes and save document until it is of acceptable size.

For example

// arbitrary limit of 10MB
long FileSizeLimit = 10*1024*1024;

// open file stream to monitor file size
using (FileStream file = new FileStream("some.data.xml", FileMode.Create))
using (XmlWriter writer = XmlWriter.Create(file))
{
    writer.WriteStartElement("root");

    // while not greater than FileSizeLimit
    for (; file.Length < FileSizeLimit; )
    {
        // write contents
        writer.WriteElementString(
            "data", 
            string.Format("{0}/{0}/{0}/{0}/{0}", Guid.NewGuid()));
    }

    // complete fragment; this is the trickiest part, 
    // since a complex document may have an arbitrarily
    // long tail, and cannot be known during file size
    // sampling above
    writer.WriteEndElement();
    writer.Flush();
}

// iteratively reduce document size
// NOTE: XDocument will load full DOM into memory
XDocument document = XDocument.Load("some.data.xml");
XElement root = document.Element("root");
for (; new FileInfo("some.data.xml").Length > FileSizeLimit; )
{
    root.LastNode.Remove();
    document.Save("some.data.xml");
}

There are ways to improve this; one possibility if memory is a constraint would be to rewrite the iterative bit to take a count of nodes actually written in first pass, then re-write the file less one element, and continue until full document is of desired size.

This last recommendation may be the route to go, especially if you already need to track elements written to resume writing in another file.

Hope this helps!

EDIT

Although intuitive, and easier to implement, I felt it worth investigating the optimization mentioned above. This is what I got.

An extension method that helps write ancestor nodes (ie container nodes, and all other kinds of markup),

// performs a shallow copy of a given node. courtesy of Mark Fussell
// http://blogs.msdn.com/b/mfussell/archive/2005/02/12/371546.aspx
public static void WriteShallowNode(this XmlWriter writer, XmlReader reader)
{

    switch (reader.NodeType)
    {
        case XmlNodeType.Element:
            writer.WriteStartElement(
                reader.Prefix, 
                reader.LocalName, 
                reader.NamespaceURI);
            writer.WriteAttributes(reader, true);
            if (reader.IsEmptyElement)
            {
                writer.WriteEndElement();
            }
            break;
        case XmlNodeType.Text: writer.WriteString(reader.Value); break;
        case XmlNodeType.Whitespace:
        case XmlNodeType.SignificantWhitespace:
            writer.WriteWhitespace(reader.Value);
            break;
        case XmlNodeType.CDATA: writer.WriteCData(reader.Value); break;
        case XmlNodeType.EntityReference: 
            writer.WriteEntityRef(reader.Name); 
            break;
        case XmlNodeType.XmlDeclaration:
        case XmlNodeType.ProcessingInstruction:
            writer.WriteProcessingInstruction(reader.Name, reader.Value);
            break;
        case XmlNodeType.DocumentType:
            writer.WriteDocType(
                reader.Name, 
                reader.GetAttribute("PUBLIC"), 
                reader.GetAttribute("SYSTEM"), 
                reader.Value);
            break;
        case XmlNodeType.Comment: writer.WriteComment(reader.Value); break;
        case XmlNodeType.EndElement: writer.WriteFullEndElement(); break;
    }
}

and a method that will perform the trimming (not an extension method, since extending any of parameter types would be a bit ambiguous).

// trims xml file to specified file size. does so by 
// counting number of "victim candidates" and then iteratively
// trimming these candidates one at a time until resultant
// file size is just less than desired limit. does not
// consider nested victim candidates.
public static void TrimXmlFile(string filename, long size, string trimNodeName)
{
    long fileSize = new FileInfo(filename).Length;
    long workNodeCount = 0;

    // count number of victim elements in xml
    if (fileSize > size)
    {
        XmlReader countReader = XmlReader.Create(filename);
        for (; countReader.Read(); )
        {
            if (countReader.NodeType == XmlNodeType.Element && 
                countReader.Name == trimNodeName)
            {
                workNodeCount++;
                countReader.Skip();
            }
        }
        countReader.Close();
    }

    // if greater than desired file size, and there is at least
    // one victim candidate
    string workFilename = filename+".work";
    for (; 
        fileSize > size && workNodeCount > 0; 
        fileSize = new FileInfo(filename).Length)
    {
        workNodeCount--;
        using (FileStream readFile = new FileStream(filename, FileMode.Open))
        using (FileStream writeFile = new FileStream(
            workFilename, 
            FileMode.Create))
        {
            XmlReader reader = XmlReader.Create(readFile);
            XmlWriter writer = XmlWriter.Create(writeFile);

            long j = 0;
            bool hasAlreadyRead = false;
            for (; (hasAlreadyRead) || reader.Read(); )
            {

                // if node is a victim node
                if (reader.NodeType == XmlNodeType.Element && 
                    reader.Name == trimNodeName)
                {
                    // if we have not surpassed this iteration's
                    // allowance, preserve node
                    if (j < workNodeCount)
                    {
                        writer.WriteNode(reader, true);
                    }
                    j++;

                    // if we have exceeded this iteration's
                    // allowance, trim node (and whitespace)
                    if (j >= workNodeCount)
                    {
                        reader.ReadToNextSibling(trimNodeName);
                    }
                    hasAlreadyRead = true;
                }
                else
                {
                    // some other xml content we should preserve
                    writer.WriteShallowNode(reader);
                    hasAlreadyRead = false;
                }
            }
            writer.Flush();
        }
        File.Copy(workFilename, filename, true);
    }
    File.Delete(workFilename);
}

If your Xml contains whitespace formatting, any whitespace between last remaining victim node and closing container element tag is lost. This can be mitigated by altering the skip clause (moving the j++ statement post skip), but then you end up with additional whitespace. The solution presented above generates a minimal file size replica of source file.

Francis B. · Answer 2 · 2009-08-19T03:57:13.877

You can write big xml file with XmlWriter or XDocument without any problem.

Here a sample example. This example generates a 63MB xml file in less than 5 seconds. For this example, I use the class XmlWriter.

using (XmlWriter writer = XmlWriter.Create("YourFilePath"))
{
    writer.WriteStartDocument();

    writer.WriteStartElement("Root");

    for (int i = 0; i < 1000000; i++) //Write one million nodes.
    {
        writer.WriteStartElement("Root");
        writer.WriteAttributeString("value", "Value #" + i.ToString());
        writer.WriteString("Inner Text #" + i.ToString());
        writer.WriteEndElement();
    }
    writer.WriteEndElement();

    writer.WriteEndDocument();
}

I have written / read xml files of multiple gigabytes using this method it works fine. For extra credit you can hook it up through a GzipStream to compress the file as well... — Ben Childs, Aug 19 '09 at 03:19

score 1 · Answer 3 · answered Aug 19 '09 at 02:44

1

Have you considered writting the XML file like a string instead of using the XML support in .NET.

I was writing ~10GB of data to XML, as it was the only way a tool could consume it.

I had a problem like this but my XML was so simple I just used a TextWriter and nested for loops to write the XML.

Worked a charm, plus was a lot faster than the XML object.

answered Aug 19 '09 at 02:44

Everything is faster than the xml object ;) – NotMe Aug 19 '09 at 02:45

File size restriction or limitation in C#

3 Answers3

Linked