3

I couldn't fetch more than 700000 rows from SQL Server using C# - I get a "out-of-memory" exception. Please help me out.

This is my code:

using (SqlConnection sourceConnection = new SqlConnection(constr))
{
    sourceConnection.Open();

    SqlCommand commandSourceData = new SqlCommand("select * from XXXX ", sourceConnection);

    reader = commandSourceData.ExecuteReader();
}

using (SqlBulkCopy bulkCopy = new SqlBulkCopy(constr2))
{
    bulkCopy.DestinationTableName = "destinationTable";

    try
    {
        // Write from the source to the destination.
        bulkCopy.WriteToServer(reader);
    }
    catch (Exception ex)
    {
        Console.WriteLine(ex.Message);
    }
    finally
    {
        reader.Close();
    }
}

I have made up small console App based on the given solution 1 but ends up with same exception also i have posted my Memory process Before and After Before Processing:enter image description here

After Processing

After adding the command timeout at the read code side, Ram Peaks up, enter image description here

user2302158
  • 417
  • 2
  • 6
  • 14
  • 5
    You can. The OOM comes for your code, not SQL Server. Configure SqlBulkCopy to send *batches* of records to the target – Panagiotis Kanavos Nov 26 '18 at 13:20
  • 1
    Batch it so you only upload x amount of records at once or buy more RAM –  Nov 26 '18 at 13:20
  • 1
    Possible duplicate of [C# : Out of Memory exception](https://stackoverflow.com/questions/8563933/c-sharp-out-of-memory-exception) – Liam Nov 26 '18 at 13:21
  • 2
    @PanagiotisKanavos Not sure what being out of mana has to do with anything –  Nov 26 '18 at 13:21
  • 1
    Needs a power up @Matt – Liam Nov 26 '18 at 13:22
  • As the [documentation of BatchSize](https://learn.microsoft.com/en-us/dotnet/api/system.data.sqlclient.sqlbulkcopy.batchsize?view=netframework-4.7.2#remarks) explains `Zero (the default) indicates that each WriteToServer operation is a single batch.` – Panagiotis Kanavos Nov 26 '18 at 13:23
  • 1
    Is that destination table in the same database as the source? If so, use a command like `INSERT INTO target SELECT * FROM source" and keep the data on the server – Hans Kesting Nov 26 '18 at 13:29
  • Aside from my Answer, Exception handling is a pet peeve of mine. And yours is bad. You continue after a fatal exception, wich is the last thing you should **ever** do. You only get more and less understandable followup exceptions. Here are two articles on the mater I link often: https://blogs.msdn.microsoft.com/ericlippert/2008/09/10/vexing-exceptions/ | https://www.codeproject.com/Articles/9538/Exception-Handling-Best-Practices-in-NET – Christopher Nov 26 '18 at 13:49
  • Is your destination database on the same host as your C#'s running on? If so potentially there's competition between the two processes... Maybe consider setting a max memory for SQL to ensure your app has some wiggle room. https://learn.microsoft.com/en-us/sql/database-engine/configure-windows/server-memory-server-configuration-options?view=sql-server-2017#providing-the-maximum-amount-of-memory-to-sql-server. That said, purely a guess. – JohnLBevan Nov 26 '18 at 20:07

3 Answers3

3

That code should not cause an OOM exception. When you pass a DataReader to SqlBulkCopy.WriteToServer you are streaming the rows from the source to the destination. Somewhere else you are retaining stuff in memory.

SqlBulkCopy.BatchSize controls how often SQL Server commits the rows loaded at the destination, limiting the lock duration and the log file growth (if not minimally logged and in simple recovery mode). Whether you use one batch or not should have no impact on the amount of memory used either in SQL Server or in the client.

Here's a sample that copies 10M rows without growing memory:

using System;
using System.Collections.Generic;
using System.Data.SqlClient;
using System.Linq;
using System.Text;
using System.Threading.Tasks;

namespace SqlBulkCopyTest
{
    class Program
    {
        static void Main(string[] args)
        {
            var src = "server=localhost;database=tempdb;integrated security=true";
            var dest = src;

            var sql = "select top (1000*1000*10) m.* from sys.messages m, sys.messages m2";

            var destTable = "dest";

            using (var con = new SqlConnection(dest))
            {
                con.Open();
                var cmd = con.CreateCommand();
                cmd.CommandText = $"drop table if exists {destTable}; with q as ({sql}) select * into {destTable} from q where 1=2";
                cmd.ExecuteNonQuery();
            }

            Copy(src, dest, sql, destTable);
            Console.WriteLine("Complete.  Hit any key to exit.");
            Console.ReadKey();
        }

        static void Copy(string sourceConnectionString, string destinationConnectionString, string query, string destinationTable)
        {
            using (SqlConnection sourceConnection = new SqlConnection(sourceConnectionString))
            {
                sourceConnection.Open();

                SqlCommand commandSourceData = new SqlCommand(query, sourceConnection);

                var reader = commandSourceData.ExecuteReader();

                using (SqlBulkCopy bulkCopy = new SqlBulkCopy(destinationConnectionString))
                {
                    bulkCopy.BulkCopyTimeout = 60 * 10;
                    bulkCopy.DestinationTableName = destinationTable;
                    bulkCopy.NotifyAfter = 10000;
                    bulkCopy.SqlRowsCopied += (s, a) =>
                    {
                        var mem = GC.GetTotalMemory(false);
                        Console.WriteLine($"{a.RowsCopied:N0} rows copied.  Memory {mem:N0}");
                    };
                     // Write from the source to the destination.
                     bulkCopy.WriteToServer(reader);

                }
            }
        }


    }
}

Which outputs:

. . .
9,830,000 rows copied.  Memory 1,756,828
9,840,000 rows copied.  Memory 798,364
9,850,000 rows copied.  Memory 4,042,396
9,860,000 rows copied.  Memory 3,092,124
9,870,000 rows copied.  Memory 2,133,660
9,880,000 rows copied.  Memory 1,183,388
9,890,000 rows copied.  Memory 3,673,756
9,900,000 rows copied.  Memory 1,601,044
9,910,000 rows copied.  Memory 3,722,772
9,920,000 rows copied.  Memory 1,642,052
9,930,000 rows copied.  Memory 3,763,780
9,940,000 rows copied.  Memory 1,691,204
9,950,000 rows copied.  Memory 3,812,932
9,960,000 rows copied.  Memory 1,740,356
9,970,000 rows copied.  Memory 3,862,084
9,980,000 rows copied.  Memory 1,789,508
9,990,000 rows copied.  Memory 3,903,044
10,000,000 rows copied.  Memory 1,830,468
Complete.  Hit any key to exit.
David Browne - Microsoft
  • 80,331
  • 6
  • 39
  • 67
  • Thanks David; good example / I'd not realised that that's how SqlBulkCopy worked under the covers (though it makes sense thinking about it). – JohnLBevan Nov 26 '18 at 20:06
0

Something went horribly wrong in your design if you even try to process 700k Rows in C#. That you fail at this is to be expected.

If this is data retrieval for display: There is no way the user will be able to process that amount of data. And filtering down from 700k Rows in the GUI is just a waste of time and Bandwidth. 25-100 fields at once is about the limit. Do filtering or pagination on the Query side so you do not end up retrieving orders of magnitude more then you can actually process.

If this is some form of Bulk insert or Bulk modification: Do that kind of operation in the SQL Server, not in your code. Retrieving, processing in C# and then posting back just adds layers of Overhead. If you add the 2 way Network transfer, you will easily triple the time this will take.

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Christopher
  • 9,634
  • 2
  • 17
  • 31
  • Looking at the code in the question, it's the second scenario. But OP uses different connection strings for reading and writing, so possibly OP is reading from server A and writing to server B, which means processing the data in C# isn't illogical. – user247702 Nov 26 '18 at 13:34
  • @Stijn: It still is. Just read data of Sever A from Server B. Especially if this is a bulk important/transfer, it should not be handeled in C# code. SQL Code run in Server B is going to be way more effective. – Christopher Nov 26 '18 at 13:39
  • There's no guarantee that server A and B can communicate directly. Your answer isn't wrong, but I think there's too much guessing here and OP needs to give some context. – user247702 Nov 26 '18 at 13:42
  • The downvotes are possibly because you don't actually give details of how to do it correctly, so it reads more as a comment than an answer. And as Stijn said, there is too much guessing as to the limitations the OP faces. Maybe give some details about how to set up linked servers and examples of SQL that copy rows from another database. – Dijkgraaf Nov 26 '18 at 20:16
  • I am not using data retrieval for display, I am just spliting up the data form one database to another based on the customers. I even tried up the above solution codes even though it results in same exception, Any other ideas guys? – user2302158 Nov 27 '18 at 10:37
  • @user2302158: I can only repeat that doing this in C# is propably the wrong way. On this scale (700k) it is practially impossible not to get an exception or run into some other limits. If it works as low as 10k-100k, it is basically down to pure luck. I am also wondering why you think you need to split the DB? Maybe what you want is actually a Distributed Databses in some form? – Christopher Nov 27 '18 at 21:54
  • My current requirement is to move huge amount data form one server to distributed Databases in a short period also it should be an automated process, so i preferred for console batch to transfer. Is the any other way apart from there batch to transfer the data? – user2302158 Nov 28 '18 at 06:04
  • @user2302158: That is the kind of thing you should be doing in the Database Server itself. This is a pure DB to DB opeartion. Most DMBS have proper support for Distributed Databases - they deal with transfering the data. Or should at least allow the sheduling for some SQL code, aka the one you should be writing. – Christopher Nov 28 '18 at 09:25
0

NB: Per DavidBrowne's answer, it seems I'd misunderstood how the batching of the SqlBulkCopy class works. The refactored code may still be useful to you, so I've not deleted this answer (as the code is still valid), but the answer is not to set the BatchSize as I'd believed. Please see David's answer for an explanation.


Try something like this; the key being setting the BatchSize property to limit how many rows you deal with at once:

using (SqlConnection sourceConnection = new SqlConnection(constr))
{
    sourceConnection.Open();
    SqlCommand commandSourceData = new SqlCommand("select * from XXXX ", sourceConnection);
    using (reader = commandSourceData.ExecuteReader() { //add a using statement for your reader so you don't need to worry about close/dispose

        //keep the connection open or we'll be trying to read from a closed connection

        using (SqlBulkCopy bulkCopy = new SqlBulkCopy(constr2))
        {
            bulkCopy.BatchSize = 1000; //Write a few pages at a time rather than all at once; thus lowering memory impact.  See https://learn.microsoft.com/en-us/dotnet/api/system.data.sqlclient.sqlbulkcopy.batchsize?view=netframework-4.7.2
            bulkCopy.DestinationTableName = "destinationTable";

            try
            {
                // Write from the source to the destination.
                bulkCopy.WriteToServer(reader);
            }
            catch (Exception ex)
            {
                Console.WriteLine(ex.Message);
                throw; //we've caught the top level Exception rather than somethign specific; so once we've logged it, rethrow it for a proper handler to deal with up the call stack
            }
        }
    }

}

Note that because the SqlBulkCopy class takes an IDataReader as an argument we don't need to download the full data set. Instead, the reader gives us a way to pull back records as required (hence us leaving the connection open after creating the reader). When we call the SqlBulkCopy's WriteToServer method, internally it has logic to loop multiple times, selecting BatchSize new records from the reader, then pushing those to the destination table before repeating / completing once the reader has sent all pending records. This works differently to, say, a DataTable, where we'd have to populate the data table with the full set of records, rather than being able to read more back as required.

One potential risk of this approach is, because we have to keep the connection open, any locks on our source are kept in place until we close our reader. Depending on the isolation level and whether other queries are trying to access the same records, this may cause blocking; whilst the data table approach would have taken a one-off copy of the data into memory and then closed the connection, avoiding any blocks. If this blocking is a concern you should look at changing the isolation level of your query, or applying hints... Exactly how you approach that would depend on the requirements though.

NB: In reality, instead of running the above code as is, you'd want to refactor things a bit, so the scope of each method is contained. That way you can reuse this logic to copy other queries to other tables. You'd also want to make the batch size configurable rather than hard-coded so you can adjust to a value that gives a good balance of resource usage vs performance (which will vary based on the host's resources).
You may also want to use async methods, to allow other parts of your program to progress whilst you're waiting on data to flow from/to your databases.

Here's a slightly amended version:

public Task<SqlDataReader> async ExecuteReaderAsync(string connectionString, string query) 
{
    SqlConnection connection;
    SqlCommand command; 
    try 
    {
        connection = new SqlConnection(connectionString); //not in a using as we want to keep the connection open until our reader's finished with it.
        connection.Open();
        command = new SqlCommand(query, connection);
        return await command.ExecuteReaderAsync(CommandBehavior.CloseConnection);  //tell our reader to close the connection when done.
    } 
    catch 
    {
        //if we have an issue before we've returned our reader, dispose of our objects here
        command?.Dispose();
        connection?.Dispose();
        //then rethrow the exception
        throw;
    }
}
public async Task CopySqlDataAsync(string sourceConnectionString, string sourceQuery, string destinationConnectionString, string destinationTableName, int batchSize)
{
    using (var reader = await ExecuteReaderAsync(sourceConnectionString, sourceQuery))
        await CopySqlDataAsync(reader, destinationConnectionString, destinationTableName, batchSize);
}
public async Task CopySqlDataAsync(IDataReader sourceReader, string destinationConnectionString, string destinationTableName, int batchSize)
{
    using (SqlBulkCopy bulkCopy = new SqlBulkCopy(destinationConnectionString))
    {
        bulkCopy.BatchSize = batchSize; 
        bulkCopy.DestinationTableName = destinationTableName;
        await bulkCopy.WriteToServerAsync(sourceReader);
    }
}
public void CopySqlDataExample()
{
    try 
    {
        var constr = ""; //todo: define connection string; ideally pulling from config 
        var constr2 = ""; //todo: define connection string #2; ideally pulling from config 
        var batchSize = 1000; //todo: replace hardcoded batch size with value from config
        var task = CopySqlDataAsync(constr, "select * from XXXX", constr2, "destinationTable", batchSize); 
        task.Wait(); //waits for the current task to complete / if any exceptions will throw an aggregate exception
    } 
    catch (AggregateException es)
    {
        var e = es.InnerExceptions[0]; //get the wrapped exception 
        Console.WriteLine(e.Message);
        //throw; //to rethrow AggregateException 
        ExceptionDispatchInfo.Capture(e).Throw(); //to rethrow the wrapped exception
    }
}
JohnLBevan
  • 22,735
  • 13
  • 96
  • 178
  • 1
    Tks for the reply, After inserting 200k record end up with same exception, Please advice me – user2302158 Nov 27 '18 at 10:51
  • @user2302158 can you see via a debugger on which line you're getting the out of memory exception? Are the source or destination databases running on the same host as your C# code? How much memory is your process using at the time of error? What other processes are competing for this memory? Any info along those lines would help us narrow down possible causes. Thanks in advance – JohnLBevan Nov 27 '18 at 10:57
  • 1
    I just used console application to transfer the data form one DB to another. Error occurs at this line bulkCopy.WriteToServer(reader); were as in the DavidBrowne's answer it occurs at bulkCopy.SqlRowsCopied += (s, a) => { var mem = GC.GetTotalMemory(false); Console.WriteLine($"{a.RowsCopied:N0} rows copied. Memory {mem:N0}"); }; – user2302158 Nov 27 '18 at 11:09
  • Thanks; and how much memory does your machine have / how much is your process using / is there another process that seems to be using more? – JohnLBevan Nov 27 '18 at 11:23
  • Around 260 GB ram, It uses only 25% of memory also no other process were processed simultaneously. – user2302158 Nov 27 '18 at 11:28
  • Thanks; so there should be plenty of wiggle room. If you can provide answers to my other questions above that may still help give us some clues; the more you give us the more able we are to help you... – JohnLBevan Nov 27 '18 at 11:31
  • 1
    I will give an other try by stooping all other process to make Ram utilization to lower level and start trigger the console app. With this we will may ever narrow down possible causes . – user2302158 Nov 27 '18 at 11:38
  • thanks in advance, I have posted my memory utilization, Please find it on my post – user2302158 Nov 27 '18 at 12:23
  • Thanks; so it looks like you're no longer seeing the out of memory exception, but a timeout. Change `bulkCopy.BulkCopyTimeout = 60 * 10;` (assuming you're using David's code) to `bulkCopy.BulkCopyTimeout = 0;` (i.e. no timeout: https://learn.microsoft.com/en-us/dotnet/api/system.data.sqlclient.sqlbulkcopy.bulkcopytimeout?view=netframework-4.7.2). In production you'd want to have some timeout value, but what that value should be should be an informed decision / once you get an idea of how long the operation takes. The previous value of 10 mins is pretty arbitrary. – JohnLBevan Nov 27 '18 at 12:29
  • I have tried that too but exception remains, Do i need to do anything on the Management Studio side? – user2302158 Nov 27 '18 at 12:33
  • When you say "exception remains" even with the timeout set to 0, could you share what the exceptions is? i.e. If we've disabled the timeout we shouldn't see a timeout exception; so maybe that's when we see the OOM exception, but that's not the exception shown in your screenshot. – JohnLBevan Nov 27 '18 at 13:01
  • When you mention "management studio", do you mean SSMS? Where does that come into the equation; are you running any code through that as well as the C# app? I have a feeling we're missing some context... – JohnLBevan Nov 27 '18 at 13:02
  • 1
    ps. You may also want to set the command timeout on the read code to 0. `command.CommandTimeout = 0;` https://learn.microsoft.com/en-us/dotnet/api/system.data.sqlclient.sqlcommand.commandtimeout?view=netframework-4.7.2 – JohnLBevan Nov 27 '18 at 13:05
  • 1
    yes bro it worked at last i have just added the command timeout on the read code,and this time the ram utilization move to 60% please find the screen shot of ram usage – user2302158 Nov 27 '18 at 13:49