0

I am trying to convert a zip file into a text file (xml) using the following methods. It works fine for smaller file but dose not seem to work for files larger than 50 mb.

class Program
{
    public static void Main(string[] args)
    {
        try
        {

            string importFilePath = @"D:\CorpTax\Tasks\966442\CS Publish error\CSUPD20180604L.zip";

            int maxLengthInMb = 20;
            byte[] payLoad = File.ReadAllBytes(importFilePath);
            int payLoadInMb = (payLoad.Length / 1024) / 1024;
            bool splitIntoMultipleFiles = (payLoadInMb / maxLengthInMb) > 1;
            int payLoadLength = splitIntoMultipleFiles ? maxLengthInMb * 1024 * 1024 : payLoad.Length;

            if (splitIntoMultipleFiles)
            {
                foreach (byte[] splitPayLoad in payLoad.Slices(payLoadLength))
                {
                    ToXml(payLoad);
                }
            }              
        }
        catch (Exception ex)
        {
            throw new Exception(ex.Message);
        }
    }

    public static string ToXml(byte[] payLoad)
    {
        using (XmlStringWriter xmlStringWriter = new XmlStringWriter())
        {
            xmlStringWriter.WriteStartDocument();
            xmlStringWriter.Writer.WriteStartElement("Payload");

            xmlStringWriter.Writer.WriteRaw(Convert.ToBase64String(payLoad));
            xmlStringWriter.Writer.WriteEndElement();
            xmlStringWriter.WriteEndDocument();
            return xmlStringWriter.ToString();
        }
    }
}

I have a .zip file which is like 120 MB in size and I get the System.OutOfMemoryException when calling Convert.ToBase64String().

So I went ahead and split the byte array into a size of 20 mb chunks hoping that it will not fail. But I see that it works until it goes through the loop 3 times i.e able to convert 60mb of the data and in the 4th iteration i get the same exception. Some times I also get exceptions at the line return xmlStringWriter.ToString()

To split the byte[] I have used the following extension classes

public static class ArrayExtensions
{
    public static T[] CopySlice<T>(this T[] source, int index, int length, bool padToLength = false)
    {
        int n = length;
        T[] slice = null;

        if (source.Length < index + length)
        {
            n = source.Length - index;
            if (padToLength)
            {
                slice = new T[length];
            }
        }

        if (slice == null) slice = new T[n];
        Array.Copy(source, index, slice, 0, n);
        return slice;
    }
    public static IEnumerable<T[]> Slices<T>(this T[] source, int count, bool padToLength = false)
    {
        for (var i = 0; i < source.Length; i += count)
        {
            yield return source.CopySlice(i, count, padToLength);
        }
    }
}

I got the above code from the following link Splitting a byte[] into multiple byte[] arrays in C#

Funny part is the program runs fine when I run it in a console application but when I put this code into the windows application it throws the System.OutOfMemoryException.

Camilo Terevinto
  • 31,141
  • 6
  • 88
  • 120
user6520378
  • 23
  • 1
  • 1
  • 5
  • You should use a `Stream` instead of `byte[]`. – SLaks Jul 26 '18 at 21:45
  • 2
    Project > Properties > Build tab, untick the "Prefer 32-bit" checkbox. You don't prefer it. – Hans Passant Jul 26 '18 at 21:45
  • Notice that attempting to calculate the Base64 of slices won't give the same result. Not even close: https://dotnetfiddle.net/IUtlzH – Camilo Terevinto Jul 26 '18 at 21:51
  • You algorithm for splitting is terrible. You are expecting the sizes of the split files to always be equal. The last split can be a different size. – jdweng Jul 26 '18 at 21:52
  • 2
    In general, it's a bad idea to try to manipulate such huge amounts of data in memory all at once; string builders, xml builders, and so on, are not designed for this scenario. My advice would be to find or implement a *streaming* builder that dumps directly out to disk rather than building up such huge structures in memory. – Eric Lippert Jul 26 '18 at 21:57
  • 1
    That's an antipattern too. At this point not putting the zip file in the xml seems better. – Joshua Jul 26 '18 at 22:18
  • Can I ask why you are trying to put a 120MB zip file in an XML document? – Dan Wilson Jul 26 '18 at 22:49
  • @DanWilson there can be lots of resions i imagine. heck i use ogg containers for save files. (Its fun watching the people at uni thinking they found some top secret audio file). But back to xml. They could be using the xml file to store many attubutes that the zip contains assets too – Courtney The coder Jul 27 '18 at 00:51
  • If you are running a 64 bit OS and have enough VM, then as @HansPassant advised, try Project > Properties > Build tab, untick the "Prefer 32-bit" checkbox and then try running the app – Sree Harsha Nellore Jul 27 '18 at 04:06

1 Answers1

1

Preferablilty you want to be doing something like this

            byte[] Packet = new byte[4096];
            string b64str = "";
            using (FileStream fs = new FileStream(file, FileMode.Open))
            {
                int i = Packet.Length;
                while (i == Packet.Length)
                {
                    i = fs.Read(Packet, 0, Packet.Length);
                    b64str = Convert.ToBase64String(Packet, 0, i);
                }
            }

with that b64str you should create your xml data. Also it is typically unwise to allocate 20mb on stack all in one go.

  • @user6520378 np. also study this. this is the most common way to read files in c# most people will read a part of the stream and do something with it. – Courtney The coder Jul 27 '18 at 20:08