1

I have code in my C# console app that is querying a LARGE dataset in SQL, and adding it to an IEnumerable collection that I use to iterate through later in the app. On a SQL table that returns less than 100K rows, it works great, but I have to use this to iterate through 100 Million records, After the SQL query runs, and Dapper tries to fill the collection, I end up with an OUT OF MEMORY exception error. I'm pretty certain it's because it's trying to write 100 Million objects at a time. Is there a way I can batch a collection with no more than say 500K objects, do what I need to do then come back and process another 500K and so on? I essentially need to READ from SQL 500K records, then write those to a file, Read another 500K , write to another file.

    public List<AxDlsd> GetDistinctDlsdObjects(AxApp axApp, OperationType operationType)
    {

        if (operationType == OperationType.Assessment)
        {

            string query = $"SELECT DISTINCT(clipid) from {axApp.dlname}";
            using (var connection = _dbConnectionFactory.GetAxDbConnection())
            {
                //SqlMapper.Settings.CommandTimeout = 0;
                List<AxDlsd> dlsdrecord = new List<AxDlsd>();

                return connection.Query<AxDlsd>(query, commandTimeout: 0, buffered: false ).ToList();

            }
        }
MikeGen18
  • 49
  • 7
  • 1
    There are too many data rows in your query the memory can't carry all of them. – D-Shih Sep 06 '18 at 20:22
  • 1
    Try this: https://github.com/StackExchange/Dapper/blob/master/Readme.md#buffered-vs-unbuffered-readers – David Libido Sep 06 '18 at 20:27
  • 1
    Oh and don't use .ToList(), it will probably cause the query to run completely creating your objects, thereby causing the buffered: false to have no effect. – David Libido Sep 06 '18 at 20:38
  • Possible duplicate of [Dapper. Paging](https://stackoverflow.com/questions/9848592/dapper-paging) – mjwills Sep 06 '18 at 21:47
  • Can I ask how you are planning to use 100 million records in your app? – mjwills Sep 06 '18 at 21:47
  • You are getting a list of distinct IDs from the database. Then you are trying to query the database for the values associated with all the ids. So the solution is to query for a subset of the ids. If the ids are going from 1 to 10000 then query only a 1000 at a time instead of all 10000. – jdweng Sep 06 '18 at 22:01
  • 1
    It's probably a mistake to try to load all of this into memory in the first place. If you have a 100 Million records, let the database do it's job and respond to the searches itself. Just moving 100M rows across the network from the database to your app will take much longer than the database could process sql statement and return targeted results. – Joel Coehoorn Sep 06 '18 at 23:45

1 Answers1

1

You can do a SELECT COUNT(DISTINCT clipid) from {axApp.dlname} to get the total and then use that to page

int pageSize = 500000;

for(var page = 0; page < (total / pageSize) + 1; page++)
{
    string query = $"SELECT DISTINCT(clipid) from {axApp.dlname} ORDER BY clipid OFFSET {page * pageSize} FETCH NEXT {pageSize} ROWS ONLY";
    ///...
}

This will allow you to go through 500k rows at a time or whatever you page size is. FETCH/OFFSET does require SQL Server 2012. I'm not sure what SQL you are using.

Todd Skelton
  • 6,839
  • 3
  • 36
  • 48
  • I'm running SQL Server 2012 SP2 and I get the error when trying to run the above query in my app: System.Data.SqlClient.SqlException: 'Incorrect syntax near '0'. Invalid usage of the option NEXT in the FETCH statement.' – MikeGen18 Sep 06 '18 at 23:21
  • page = 0 in your example, so 0 * 500,000 is 0 – MikeGen18 Sep 06 '18 at 23:24
  • Right, you want page to be 0 the first 500k rows query, then it will be 1 the next which will grab rows 500,001 to 1 million. – Todd Skelton Sep 06 '18 at 23:27
  • OFFSET is how many records to skip and FETCH is how many to retrieve, so on the first query, you want to skip 0 records and grab 500k. On the second query, you want to skip 500k records and grab the next 500k, etc. – Todd Skelton Sep 06 '18 at 23:30
  • @MikeGen18 Updated my answer, you need to add order by clipid to the query. – Todd Skelton Sep 06 '18 at 23:39
  • Thanks Todd! I'll give this a shot – MikeGen18 Sep 07 '18 at 03:17