2

I am trying to stream the contents of a file. The code works for smaller files, but with larger files, I get an Out of Memory error.

public void StreamEncode(FileStream inputStream, TextWriter tw)
{
    byte[] base64Block = new byte[BLOCK_SIZE];
    int bytesRead = 0;

    try
    {
        do
        {
            // read one block from the input stream
            bytesRead = inputStream.Read(base64Block, 0, base64Block.Length);

            // encode the base64 string
            string base64String = Convert.ToBase64String(base64Block, 0, bytesRead);

            // write the string
            tw.Write(base64String);

        } while (bytesRead == base64Block.Length);
    }
    catch (OutOfMemoryException)
    {
        MessageBox.Show("Error -- Memory used: " + GC.GetTotalMemory(false) + " bytes");
    }
}

I can isolate the problem and watch the memory used grow as it loops.
The problem seems to be the call to Convert.ToBase64String().

How can I free the memory for the converted string?


Edited from here down ... Here is an update. I also created a new thread about this -- sorry I guess that was not the right thing to do.

Thanks for your great suggestions. From the suggestions, I shrunk the buffer size used to read from the file, and it looks like memory consumption is better, but I'm still seeing an OOM problem, and I'm seeing this problem with files sizes as small as 5MB. I potentially want to deal with files ten times larger.

My problem seems now to be with the use of TextWriter.

I create a request as follows [with a few edits to shrink the code]:

HttpWebRequest oRequest = (HttpWebRequest)WebRequest.Create(new Uri(strURL));
oRequest.Method = httpMethod;
oRequest.ContentType = "application/atom+xml";
oRequest.Headers["Authorization"] = getAuthHeader();
oRequest.ContentLength = strHead.Length + strTail.Length + longContentSize;
oRequest.SendChunked = true;

using (TextWriter tw = new StreamWriter(oRequest.GetRequestStream()))
{
    tw.Write(strHead);
    using (FileStream fileStream = new FileStream(strPath, FileMode.Open, 
           FileAccess.Read, System.IO.FileShare.ReadWrite))
    {
        StreamEncode(fileStream, tw);
    }
    tw.Write(strTail);
}
.....

Which calls into the routine:

public void StreamEncode(FileStream inputStream, TextWriter tw)
{
    // For Base64 there are 4 bytes output for every 3 bytes of input
    byte[] base64Block = new byte[9000];
    int bytesRead = 0;
    string base64String = null;

    do
    {
        // read one block from the input stream
        bytesRead = inputStream.Read(base64Block, 0, base64Block.Length);

        // encode the base64 string
        base64String = Convert.ToBase64String(base64Block, 0, bytesRead);

        // write the string
        tw.Write(base64String);


    } while (bytesRead !=0 );

}

Should I use something other than TextWriter because of the potential large content? It seems very convenient for being able to create the whole payload of the request.

Is this totally the wrong approach? I want to be able to support very large files.

Community
  • 1
  • 1
George
  • 211
  • 5
  • 12
  • You shouldn't be catching OutOfMemoryException (in fact, in .NET4, you can't, at least without resorting to [syntactic salt](http://en.wikipedia.org/wiki/Syntactic_sugar#Syntactic_salt) ). Now, regarding the question... What is the BLOCK_SIZE, what TextWriter are you using, and how many bytes are read? One or more of these could be the culprit. – R. Martinho Fernandes Mar 25 '11 at 17:41
  • Remember that [a base-64 string is longer than the original data](http://stackoverflow.com/questions/4715415/base64-what-is-the-worst-possible-increase-in-space-usage/4715480#4715480). – R. Martinho Fernandes Mar 25 '11 at 17:52
  • I added the try/catch for helping diagnose this problem. It wasn't in the way I originally wrote it. – George Mar 25 '11 at 18:14
  • The BLOCK_SIZE value is 54000. I decreased it to 30000, but the memory still grows -- but it takes more loops now since it is smaller. – George Mar 25 '11 at 18:16
  • The block size is probably 32kB or larger, producing too many large strings in the LOH. A smaller size is fine, Convert.ToBase64CharArray() is best. – Hans Passant Mar 25 '11 at 18:18
  • Is this loop on the main thread or a background thread? I wonder if your loop is not allowing the GC to run. – SwDevMan81 Mar 25 '11 at 18:18
  • @George: to be sure try something *drastically* smaller, like 1024. If that doesn't work... – R. Martinho Fernandes Mar 25 '11 at 18:21
  • This code is intended to eventually run on a background thread. After the problem has come up, I've isolated the code and still see the problem when it runs in the main thread. – George Mar 25 '11 at 18:27
  • I reduced BLOCK_SIZE to 8192. That seems better, but I still see memory growing. If I then comment out the write (so that it only reads and converts), the memory stays pretty constant -- that's not what I observed before reducing the BLOCK_SIZE. So I think the TextWriter is a problem too. It is writing to an HttpWebRequest / Web Service. – George Mar 25 '11 at 18:32
  • TextWriter tw = new StreamWriter(oRequest.GetRequestStream()) – George Mar 25 '11 at 18:36
  • Sorry. I've updated the previous thread with this additional information. – George Mar 25 '11 at 20:13
  • possible duplicate of [How to free up memory after base64 convert](http://stackoverflow.com/questions/5436064/how-to-free-up-memory-after-base64-convert) – ChaosPandion Mar 25 '11 at 20:26

7 Answers7

4

If you use a BLOCK_SIZE that is 32 kB or more, you will be creating strings that are 85 kB or more, which are allocated on the large objects heap. Short lived objects should live in the regular heaps, not the large objects heap, so that may be the reason for the memory problems.

Also, I see two potential problems with the code:

  • The base64 encoding uses padding at the end of the string, so if you chop up a stream into bits and convert to base64 strings, and then write the strings to a stream, you don't end up with a single base64 stream.

  • Checking if the number of bytes read using the Read method is the same as the number of requested bytes is not the proper way of checking for the end of the stream. The Read method may read less bytes than requested any time it feels like it, and the correct way to check for the end of the stream is when the method returns zero.

Guffa
  • 687,336
  • 108
  • 737
  • 1,005
  • 1
    To expand on the two additional issues you pointed: the OP will need to add logic to ensure that only inputs with sizes that are multiples of 3 are passed into the Convert.ToBase64String method. The last block can be any size though. – R. Martinho Fernandes Mar 25 '11 at 18:08
  • These are awesome suggestions in general, but it should not cause OOM by itself in this case as these blocks in this code will be collected due to memory allocation request. – Alexei Levenkov Mar 25 '11 at 18:38
  • @Alexei Levenkov: That works fine for the regular heap, but the large objects heap is managed differently. It can get defragmented by frequent allocations and deallocations, and it will never get totally cleaned out as a regular heap generation can be. – Guffa Mar 25 '11 at 20:07
  • Between a combination of lowering the BLOCK_SIZE and setting the AllowWriteStreamBuffering flag on the request object, the memory problem has gone away for me. Thanks everyone! – George Mar 25 '11 at 20:50
1

Keep in mind that when converting data to base64, the resulting string will be 33% longer (assuming the input size is a multiple of 3, which is probably a good idea in your case). If BLOCK_SIZE is too large there might not be enough contiguous memory to hold the resulting base-64 string.

Try reducing BLOCK_SIZE, so that each piece of the base-64 is smaller, making it easier to allocate the memory for it.

However, if you're using an in-memory TextWriter like a StringWriter, you may run into the same problem, because it would fail to find a block of memory large enough to hold the internal buffer. If you're writing to something like a file, this should not be a problem, though.

R. Martinho Fernandes
  • 228,013
  • 71
  • 433
  • 510
1

Wild guess...HttpWebRequest.AllowWriteStreamBuffering is by default true, and according to MSDN "setting AllowWriteStreamBuffering to true might cause performance problems when uploading large datasets because the data buffer could use all available memory". Try setting oRequest.AllowWriteStreamBuffering = false and see what happens.

PoppaVein
  • 657
  • 2
  • 5
  • 11
  • Thanks for your guess PoppaVein! That did it. I examined the memory during the upload of a 7MB file and memory hovered in a low band, and then the upload was successful. – George Mar 25 '11 at 20:46
0

Try pulling your base64String declaration out of the loop. If that still doesn't help, try calling the garbage collector after so many iterations.

GC.Collect(); GC.WaitForPendingFinalizers();

TyCobb
  • 8,909
  • 1
  • 33
  • 53
  • Calling GC manually from code is not recommended. See http://blogs.msdn.com/b/ericlippert/archive/2009/06/08/out-of-memory-does-not-refer-to-physical-memory.aspx for more info. – Morten Mertner Mar 25 '11 at 17:50
  • This is true, but it is worth a shot. When you create a custom class with the IDisposable interface, the collector is sometimes called on the Dispose() method. I have had no issues with doing that. – TyCobb Mar 25 '11 at 17:54
  • Why would this work? If this works, it means the GC was sleeping on the job, because the system was low on memory and yet the GC didn't attempt to reclaim some on its own. I highly doubt that. – R. Martinho Fernandes Mar 25 '11 at 17:58
  • I tried pulling base64String out of the loop, but it didn't help. – George Mar 25 '11 at 18:06
  • Then I added Garbage collection and saw a different behavior. I thought it was better, but it really doesn't help. The memory stays constant for a large number of loops and then makes big jumps. 23MB .... 27MB .... 31MB .... 40MB ... Out of Memory. Without the GC, the upward change was gradual. – George Mar 25 '11 at 18:09
  • Check Martinho Fernandes and my reasoning - likley you are using memory-based stream (MemoryStream or StringStream) as destination and it had to grow its buffer by factor of 2 every time it needs to grow. For huge files it will soon fail to find large enough continuous chunk of memory and you get OOM. – Alexei Levenkov Mar 25 '11 at 18:34
  • Reducing BLOCK_SIZE seems to have helped some. I still get OOM errors though. I initially create an HttpWebRequest object. And from it call GetRequestStream() and get a TextWriter. What else can I use? – George Mar 25 '11 at 19:02
0

Try reducing the block size or avoid assigning the result of the Convert call to a variable:

bytesRead = inputStream.Read(base64Block, 0, base64Block.Length);
tw.Write(Convert.ToBase64String(base64Block, 0, bytesRead));
Morten Mertner
  • 9,414
  • 4
  • 39
  • 56
0

Code looks ok from memory usage point of view, but I think you are passing writer for Memory-based stream (like MemoryStream) and storing data there causes OOM exception.

If BLOCK_SIZE is above 86Kb allocations will happen on Large Objects Heap (LOH), it will change behavior of allocations, but should not cause OOM by itself.

Note: your end condition is not correct - should be bytesRead != 0, in genral Read can return less bytes than asked even if there are more data left. Also FileStream is never doing it to my knowledge.

Alexei Levenkov
  • 98,904
  • 14
  • 127
  • 179
0

I would write the result to a temp file first.

using (TextWriter tw = new StreamWriter(oRequest.GetRequestStream()))
{
    tw.Write(strHead);
    var tempPath = Path.GetTempFileName();
    try
    {
        using (var input = File.OpenRead(strPath))
        using (var output = File.Open(
            tempPath, FileMode.Open, FileAccess.ReadWrite))
        {
            StreamEncode(fileStream, output);
            output.Seek(0, SeekOrigin.Begin);
            CopyTo(output, ((StreamWriter)tw).BaseStream);
        }
    }
    finally
    {
        File.Delete(tempPath);
    }
    tw.Write(strTail);
}

public void StreamEncode(Stream inputStream, Stream output)
{
    // For Base64 there are 4 bytes output for every 3 bytes of input
    byte[] base64Block = new byte[9000];
    int bytesRead = 0;
    string base64String = null;

    using (var tw = new StreamWriter(output))
    {
        do
        {
            // read one block from the input stream
            bytesRead = inputStream.Read(base64Block, 0, base64Block.Length);

            // encode the base64 string
            base64String = Convert.ToBase64String(base64Block, 0, bytesRead);

            // write the string
            tw.Write(base64String);

        } while (bytesRead !=0 );
    }

}


static void CopyTo(Stream input, Stream output)
{
    const int length = 10240;
    byte[] buffer = new byte[length];
    int count = 0;

    while ((count = input.Read(buffer, 0, length)) > 0)
        output.Write(buffer, 0, count);
}
ChaosPandion
  • 77,506
  • 18
  • 119
  • 157
  • Thanks for for taking the time to write this. Since the AllowWriteStreamBuffering flag is working for me on the request, I think I don't need it now. – George Mar 25 '11 at 20:47