I have existing code that has been used for years to upload an XML and TIF file pair via an HttpWebRequest POST request. Problem is, on large TIF files it chews through memory like a flock of beavers attacking a forest. I started digging into the code today in an attempt to make it more memory-efficient.
The existing code loads XML and TIF content into a string object, which is then converted into a byte array and fed into the HTTP request. Many string concatenations are involved throughout. TIF file is loaded and converted to string object like this, where br2 is a BinaryReader object:
System.Text.Encoding.Default.GetString(br2.ReadBytes(tifByteCount))
I now know that using Encoding.Default is not wise, but changing that will require working with the client to change their decoding of the file submissions, so that is for another time. I will likely change to base64 encoding when I make that change. Anyway...
The first item I changed was all of my string concatenations, because I figured that was bogging things down, especially when working with the TIF-string object. I'm using a StringBuilder object now and appending everything.
I then searched for "byte array to string conversion" and tried several different results that I found, including this one and this one, but both used a different encoding than my existing code.
I then used the System.Text.Encoding.Default.Decoder object to decode the entire TIF file into a char[] array at one time. That didn't improve the memory at all, but did at least use the same encoding.
The file I've been testing with today is a 185 MB TIF file. While testing on my dev machine, my Windows physical memory usage would start at 2 GB used, and would quickly climb to 5+ GB and then max out at 5.99 GB and promptly lock up until the debugger killed itself. As far as I could tell I was only loading a single instance of the TIF file into memory, so I couldn't understand why 185 MB was using up 4 GB of memory.
Anyway, next I tried loading in the TIF file in much smaller chunks. 1000 bytes at a time. This looked promising initially. It only used 2 GB of memory when loading all but the last <1000 bytes of the file. On the last chunk of bytes though (in this case 928 bytes), this line charCount = dc.GetCharCount(ba2, x, (int)fileStream2.Length - x)
caused the memory to momentarily spike by 1 GB, the following line chars2 = new Char[(int)fileStream2.Length - x]
increased memory by 700 MB, and the following line charsDecodedCount = dc.GetChars(ba2, x, (int)fileStream2.Length - x, chars2, 0)
pushed the memory to the max and locked up the system.
The code below shows the last approach tried - the one described in the previous paragraph.
BinaryReader br2 = new BinaryReader(fileStream2);
byte[] ba2 = br2.ReadBytes((int)fileStream2.Length);
Char[] chars2 = null;
if ((int)fileStream2.Length > 1000)
{
for (int x = 0; x < (int)fileStream2.Length; x += 1000)
{
if (x + 1000 > (int)fileStream2.Length)
{
charCount = dc.GetCharCount(ba2, x, (int)fileStream2.Length - x);
chars2 = new Char[(int)fileStream2.Length - x];
charsDecodedCount = dc.GetChars(ba2, x, (int)fileStream2.Length - x, chars2, 0);
}
else
{
charCount = dc.GetCharCount(ba2, x, 1000);
chars2 = new Char[charCount];
charsDecodedCount = dc.GetChars(ba2, x, 1000, chars2, 0);
}
sbRequest.Append(chars2);
chars2 = null;
}
}
else
{
charCount = dc.GetCharCount(ba2, 0, ba2.Length);
chars2 = new Char[charCount];
charsDecodedCount = dc.GetChars(ba2, 0, ba2.Length, chars2, 0);
sbRequest.Append(chars2);
}
I have a feeling I'm missing something fairly obvious. I'd appreciate any advice on resolving this. I'd like to be able to load in a 185 MB TIF file without using 4 GB of memory!