17

Given the opportunity to rewrite, I would, but anyway, the code as it stands:

List<string> foobar;

Then we add a bunch of strings to foobar.

At count=16777216, we hit an out of memory limit.

My understanding is that each string would be a different size. Indeed looking at the data (not my data), most are 2 or 3 characters.

what is the max limit of data into list in c#? indicates that the max limit is:

The maximum number of elements that can be stored in the current implementation of List is, theoretically, Int32.MaxValue - just over 2 billion.

However:

In the current Microsoft implementation of the CLR there's a 2GB maximum object size limit. (It's possible that other implementations, for example Mono, don't have this restriction.)

In my example, I have, what, 16 million results * a few bytes? Task manager shows about a gig being used, but I have 8 gigs of RAM.

16777216 (2^24) seems like a fairly specific value - suspiciously like a limit, but I can't find any documentation anywhere to a) back this up or b) find a way around it?

Any help would be appreciated.


Some code:

List<string> returnList = new List<string>();
SqlDataReader dr; //  executes a read on a database, have removed that part as that bit works fine

  if (dr.HasRows)
  {
      while (dr.Read())
      {
          returnList.Add(dr.GetString(0).Trim());
      }
  }

That's the simplified form, I now have some try/catch for the OOM Exception, but this is the actual code that's giving me grief.

Community
  • 1
  • 1
Mark Mayo
  • 12,230
  • 12
  • 54
  • 85
  • Heap size is different from RAM size, a portion of RAM is used as heap, which is used to store objects. http://stackoverflow.com/questions/2325370/c-sharp-increase-heap-size-is-it-possible – Amir Nov 22 '12 at 22:50
  • @JonSkeet - updated with some snippets, let me know if there's anything else required. – Mark Mayo Nov 22 '12 at 22:55
  • If you can, you could try increasing the size of the strings - just add a bunch of padding in at the end of each one or something. If that makes it run out sooner then it *might* be actually running out of memory for something, and the specific number could arise from regularities in the data. – Jeff Nov 22 '12 at 22:56
  • @Jeff - added a few spaces to each, and yes, it's now dying at 13 million records or so. Intriguing. – Mark Mayo Nov 22 '12 at 22:58
  • Is this a theoretical question or are you really adding 16 million items to a list for usage in an application? – Patrick Nov 22 '12 at 22:58
  • 1
    @Patrick I'm adding 21 million, actually :/ It's a data migration, and was being used for a small migration and now being shoehorned to work with much larger data set. – Mark Mayo Nov 22 '12 at 23:00
  • Are you able to process the data as you go instead of trying to keep it all in memory at once? – TheEvilPenguin Nov 22 '12 at 23:00
  • Yes, although only with a substantial rewrite, so I'm trying to find a faster solution. In theory I could just query the db for the values rather than querying the loaded table values in memory. – Mark Mayo Nov 22 '12 at 23:02
  • 1
    @MarkMayo I've done something like this sometimes back, it was an SSIS package and was dealing with a large amount of data as you do! What I did was to create an stored proc on my table(with millions of records), and each time fetched 1000 records, transferred them across,released the list, and move to the next lot. – Amir Nov 22 '12 at 23:04
  • 2
    I would initialize the list with the correct size, for example by using `select count(*) from TableName` first and [this constructor](http://msdn.microsoft.com/en-us/library/dw8e0z9z.aspx). Then you would avoid the doubling algorithm used to resize the backing buffer of `List.Add`. – Tim Schmelter Nov 22 '12 at 23:05
  • @TimSchmelter - will give that a try. back shortly. – Mark Mayo Nov 22 '12 at 23:15
  • @TimSchmelter - now it's getting even fewer... :/ – Mark Mayo Nov 22 '12 at 23:22

3 Answers3

13

If you're trying to use very large lists in 64 bit environments you need to enable large objects in the application configuration.

http://msdn.microsoft.com/en-us/library/hh285054.aspx

The OOM is likely due to the way Lists/ArrayLists allocate memory, which I believe is each time their boundary is reached, they attempt to double in size. The list cannot double from 2^24. You could theoretically maximize your list size by pre-specifying a size. (I.e. 2GB)

Steve Py
  • 26,149
  • 3
  • 25
  • 43
1

I've posted what I exactly did here, worth giving it a go. Again steps are:

  1. On each iteration query portion of data using an stored proc
  2. Transfer them
  3. Move to the next portion

    List<string> returnList;
    int index = 0;
    SqlCommand cmd = new SqlCommand("ExampleStoredProc", conn);
    cmd.CommandType = CommandType.StoredProcedure;
    while (true)
    {
        cmd.Parameters.Add(
            new SqlParameter("@index", index));
        SqlDataReader dr = cmd.ExecuteReader();
        if (dr.HasRows)
        {
            returnList = new List<string>();
            returnList.Add(dr.GetString(0).Trim());
            //transfer data here
        }
        else
        {
            break;
        }
        index++;
    }
    

and the stored proc should be something like this:

CREATE PROCEDURE ExampleStoredProc
    @index INT
AS
BEGIN
    SELECT * 
    FROM  veryBigTable
    WHERE Id >= (@index *1000) AND Id < ((@index + 1) * 1000)
END
GO

I'll definitely work no matter how many records you have, just the more data you have, longer it'll take to finish.

Amir
  • 9,577
  • 11
  • 41
  • 58
1

If it's getting even fewer than 2^24 when you manually set the correct list size then that's probably on the right track. Instead of getting to 16 million and then trying to double the size of the list, it'll be making the list really large to begin with and running out of memory earlier.

That explains why you were getting a round number - it reached the 2^24 then tried to increase in size, which caused it to use too much memory.

Sounds to me like it's some kind of 'natural' object size limit, as opposed to one in the implementation of the list.

Jeff
  • 12,555
  • 5
  • 33
  • 60