2

I am attempting to query the database and pull excel files that could be as large as 1 million rows (~200MB) stored as varbinary and pass it through a validator.

Our build server has 6GB of memory and a load-balanced processor and during runtime comes nowhere near maxing out the CPU or Memory.

Yet, after about 40 seconds the process throws an OutOfMemoryException.

Here's the stack trace:

System.OutOfMemoryException: Exception of type 'System.OutOfMemoryException' was thrown.
   at System.Data.SqlTypes.SqlBinary.get_Value()
   at System.Data.SqlClient.SqlBuffer.get_ByteArray()
   at System.Data.SqlClient.SqlBuffer.get_Value()
   at System.Data.SqlClient.SqlDataReader.GetValueFromSqlBufferInternal(SqlBuffer data, _SqlMetaData metaData)
   at System.Data.SqlClient.SqlDataReader.GetValueInternal(Int32 i)
   at System.Data.SqlClient.SqlDataReader.GetValue(Int32 i)
   at System.Data.SqlClient.SqlCommand.CompleteExecuteScalar(SqlDataReader ds, Boolean returnSqlValue)
   at System.Data.SqlClient.SqlCommand.ExecuteScalar()
   at eConfirmations.DataService.FileServices.FileDataService.GetFileContent(Guid fileId) in d:\w1\3\s\Source\eConfirmations.DataService\FileServices\FileDataService.cs:line 157
...
   at System.Data.SqlTypes.SqlBinary.get_Value()
   at System.Data.SqlClient.SqlBuffer.get_ByteArray()
   at System.Data.SqlClient.SqlBuffer.get_Value()
   at System.Data.SqlClient.SqlDataReader.GetValueFromSqlBufferInternal(SqlBuffer data, _SqlMetaData metaData)
   at System.Data.SqlClient.SqlDataReader.GetValueInternal(Int32 i)
   at System.Data.SqlClient.SqlDataReader.GetValue(Int32 i)
   at System.Data.SqlClient.SqlCommand.CompleteExecuteScalar(SqlDataReader ds, Boolean returnSqlValue)
   at System.Data.SqlClient.SqlCommand.ExecuteScalar()
   at eConfirmations.DataService.FileServices.FileDataService.GetFileContent(Guid fileId) in d:\w1\3\s\Source\eConfirmations.DataService\FileServices\FileDataService.cs:line 157

And here's the my code that throws the exception:

    private byte[] GetFileContent(Guid fileId)
    {
        byte[] content;
        string connectionString = ConfigurationManager.ConnectionStrings["eConfirmationsDatabase"].ConnectionString;

        using (SqlConnection sqlConnection = new SqlConnection(connectionString))
        {
            using (SqlCommand sqlCommand = sqlConnection.CreateCommand())
            {
                sqlCommand.CommandTimeout = 300;
                sqlCommand.CommandText = $"SELECT Content FROM dbo.[File] WHERE FileId = '{fileId}'";
                sqlConnection.Open();
                content = sqlCommand.ExecuteScalar() as byte[];
                sqlConnection.Close();
                sqlCommand.Dispose();
            }
            sqlConnection.Dispose();
        }
        return content;
    }

Is there a more efficient way to pull back this data or can we update a setting on our build server to avoid this error?

Nathan Foss
  • 595
  • 1
  • 9
  • 34
  • How much heap space does the C# environment have available? – Carcigenicate Apr 19 '17 at 14:23
  • @Carcigenicate [C# Increase Heap Size - Is It Possible](http://stackoverflow.com/a/2325396/402022). – Theraot Apr 19 '17 at 14:24
  • 1
    Or a SqlDataReader with the SequentialAccess flag would allow you to stream the data. – Alex K. Apr 19 '17 at 14:28
  • It looks like a few other people has gor mysterious OutOfMemoryExceptions on the same point. For instance see [Why do I get a System.OutOfMemoryException using NHibernate?](http://stackoverflow.com/questions/16170293/why-do-i-get-a-system-outofmemoryexception-using-nhibernate). None of the search results I found have a solution. I suspect there is an artificial constraint for the size of the result. Btw, you know you are double - actually triple - disposing, right? – Theraot Apr 19 '17 at 14:33
  • @Theraot Triple disposing was to make sure there were no memory leaks – Nathan Foss Apr 19 '17 at 14:44
  • I know it may be a "duhh" but have you checked a level up in your code to see if what's calling this is somehow looping? I can't tell you how many times I've found bugs in code where the same function was called more than once unnecessarily, but the stack trace didn't reflect it. – CDove Apr 19 '17 at 15:22
  • @CDove thanks for the input, I'm sure this is only running once and not looping through. I just can't speak for the SqlDataReader class and what it's doing behind the scenes – Nathan Foss Apr 19 '17 at 15:30
  • I would recommend instead of returning a big byte array, send in a target stream to this function and write to that. – Magnus Apr 19 '17 at 15:55
  • 1
    Shouldn't the first parameter in `GetBytes` by 0 rather than 1? – Magnus Apr 19 '17 at 15:57

1 Answers1

2

Ok so here's what's happening:

Because this is running on a 32-bit build, the maximum memory allocation is 2GB but I'm still coming nowhere near that threshold.

According to this stackoverflow post that is very similar to my situation, the .NET framework restricts objects to a limit of 256MB in memory.

So even though my file is only 200MB, byte[]s and MemoryStreams expand by powers of 2 until they reach the 256MB necessary. When they expand, they create a new instance of the appropriate size and copy the old data over to the new one, effectively multiplying the memory usage by 3 which causes the exception.

MSDN has an example of how to retrieve a large file using a FileStream, but instead of a FileStream, I use a static byte[] pre-initialized to the size of my data using this post.

Here is my final solution:

    public File GetFileViaFileIdGuid(Guid fileId)
    {
        File file = new File();
        string connectionString = ConfigurationManager.ConnectionStrings["Database"].ConnectionString;
        using (var sourceSqlConnection = new SqlConnection(connectionString))
        {
            using (SqlCommand sqlCommand = sourceSqlConnection.CreateCommand())
            {
                sqlCommand.CommandText = $"SELECT FileName, FileExtension, UploadedDateTime, DATALENGTH(Content) as [ContentLength] FROM dbo.[File] WHERE FileId = '{fileId}'";
                sqlCommand.CommandType = CommandType.Text;
                sqlCommand.CommandTimeout = 300;
                sourceSqlConnection.Open();

                var reader = sqlCommand.ExecuteReader();
                while (reader.Read())
                {
                    file.FileId = fileId;
                    file.FileExtension = reader["FileExtension"].ToString();
                    file.FileName = reader["FileName"].ToString();
                    file.UploadedDateTime = (DateTime)reader["UploadedDateTime"];
                    file.Content = new byte[Convert.ToInt32(reader["ContentLength"])];
                }

                reader.Close();
                sourceSqlConnection.Close();
            }
        }
        file.Content = GetFileContent(file.FileId, file.Content.Length);
        return file;
    }

And to fetch the Content:

    private byte[] GetFileContent(Guid fileId, int contentLength)
    {
        int outputSize = 1048576;
        int bufferSize = contentLength + outputSize;
        byte[] content = new byte[bufferSize];
        string connectionString = ConfigurationManager.ConnectionStrings["Database"].ConnectionString;

        using (SqlConnection sqlConnection = new SqlConnection(connectionString))
        {
            using (SqlCommand sqlCommand = sqlConnection.CreateCommand())
            {
                sqlCommand.CommandTimeout = 300;
                sqlCommand.CommandText = $"SELECT Content FROM dbo.[File] WHERE FileId = '{fileId}'";
                sqlConnection.Open();
                using (SqlDataReader reader = sqlCommand.ExecuteReader(CommandBehavior.SequentialAccess))
                {

                    while (reader.Read())
                    {
                        int startIndex = 0;
                        long returnValue = reader.GetBytes(0, startIndex, content, startIndex, outputSize);
                        while (returnValue == outputSize)
                        {
                            startIndex += outputSize;
                            returnValue = reader.GetBytes(0, startIndex, content, startIndex, outputSize);
                        }
                    }
                }

                sqlConnection.Close();
            }
        }
        return content;
    }
Community
  • 1
  • 1
Nathan Foss
  • 595
  • 1
  • 9
  • 34
  • If your validator runs in a separate process, you can stream the file directly to disk or pipe the stream to the other process. That lets you grab the file in tiny chunks. – Berin Loritsch Apr 19 '17 at 20:01
  • @BerinLoritsch Unfortunately not. I'm using `Aspose.Cells` to create an excel file in memory and it will fail if it doesn't have the entire content – Nathan Foss Apr 19 '17 at 20:17
  • Is the reason you are building a 32-bit app because of the `Aspose.Cells` library? As a somewhat related data point, I found the largest 32 bit C# app ends up being around 1.4 GB before running into memory issues. Remember that you also have to have room to garbage collect, and there are multiple GC buckets--if any of them fill up you will get an `OutOfMemoryException`. Lastly, if you use WinForms and attempt to draw something that is smaller than 1 pixel you also get an `OutOfMemoryException`. That was the most surprising thing I ran across. in dealing with memory problems. – Berin Loritsch Apr 19 '17 at 23:26