6

I've always believed that DataTable would consume more memory than a generic List. I am testing loading a DataTable and loading a List from a SQL Server query. In this case, the DataTable is consuming less memory. I'm getting the top 2000 rows and there are 134 fields per row. One binary field and the rest are standard varchar, int, bit and so on.

How in the world could a DataTable with all it's overhead consume less memory than List? The GC is reporting approx 4mb with DataTable and 5mb with list.

I did test with a few of the NorthWind tables and the list was marginally smaller in those cases.

private void GetMemory()
    {
        label1.Text = string.Format("{0:0.00} MB", GC.GetTotalMemory(true) / 1024.0 / 1024.0);            
    }
    private void DataTableButton_Click(object sender, EventArgs e)
    {
        var conn = new SqlConnection(@"Initial Catalog=ADatabase;Data Source=AServer;Integrated Security=True");
        conn.Open();

        var cmd = new SqlCommand("SELECT TOP 2000 * FROM AManyColumnedTable", conn);

        var r = cmd.ExecuteReader();
        _rows = new List<object[]>();

        //uses more memory
        object[] a = null;
        while (r.Read())
        {
            a = new object[r.FieldCount];
            r.GetValues(a);
            _rows.Add(a);
        }
        //uses less memory
        //_table = new DataTable("TheTable");
        //_table.Load(r);

        r.Close();
        conn.Close();
        GetMemory();
    }
Travis J
  • 81,153
  • 41
  • 202
  • 273
Steve
  • 2,153
  • 4
  • 22
  • 31

3 Answers3

6

It's an illusion. Somehow the GC doesn't give you the right value, perhaps because there are objects that are not collected yet. Maybe loading a data table causes more intermediate objects, which causes another garbage collection.

Your list of arrays store the data in the same manner as the data table, but without the overhead of the extra information that the data table and each data row contains.

If you want it to be more memory efficient, you should create a class for a data record, that way you can store the value types as plain values instead of boxed in objects. An int for example uses 20 bytes when it's boxed but only 4 as a plain member variable.

Guffa
  • 687,336
  • 108
  • 737
  • 1,005
  • Interesting, I tried calling GC.Collect directly after load the List, unfortunately it did not change the result. I used GC.Collect(GC.MaxGeneration, GCCollectionMode.Forced);. I will play around with the boxings. I didn't think of that, thanks. I was trying to keep it generic by using GetValues. I'm looking at the implementation of DataRow and its using a DataStorage class to hold values not an object[] as I assumed it did. I've never heard of the DataStorage class, maybe it would help. Thanks for the insight! +1 – Steve Nov 20 '09 at 18:15
  • You're right, it was the boxing consuming the extra memory. I just tried loading the row's values into generic lists and made lists use non-nullable primitive datatypes. The memory was 3.8mb. The DataTable was 4.4mb. Good call! – Steve Nov 20 '09 at 20:29
1

Just out of curiosity, try calling TrimExcess on the List and then check the memory. I doubt this will make a huge difference, but I suspect it will make some difference.

Basically, a List grows in batches. It doesn't grow by one every time you add an object, rather it grows by a certain amount, and then if it needs more, it allocates more. I'm curious if there's some extra space in your list that you're not using.

BFree
  • 102,548
  • 21
  • 159
  • 201
  • I received the same result, unfortunately. I didn't think of that, nice post. – Steve Nov 20 '09 at 17:44
  • 1
    Note that TrimExcess only does anything if the amount of items in the List is smaller than 0.9 times the length of the array. This could be the difference between 5 MB and 4.5 MB ... explaining a whole 1 MB of difference this way is a stretch though. – Joren Nov 20 '09 at 17:55
  • 1
    As the list itself isn't larger than a few kilobyte, there is no surprise that there is no noticable difference in trimming it. The list only contains references, the unused items are just null references, not references to empty arrays. – Guffa Nov 20 '09 at 18:05
  • I was assuming the actual list was 5 MB. If not then the entire measurement is pointless. – Joren Nov 21 '09 at 17:14
0

For me it happened the other way around. Well, I was not fetching data from the database but pushing data to the database using bulk-insert. While processing data I got around 50 million records as output which I wanted to store into the database. I used bulk insert with 200k rows per push using DataTable (having only 8 columns) and found that for each insert it took around an average of 95 MB of data (used network monitoring tools and serializer to find the size of object). When I used List(Using IDataReader) to bulk insert, it dropped the size for a (one push) push to 15 MB average(used serialization to find the size of object and monitored in task manager). Time taken for processing and uploading of data using DataTable was 1657 seconds and it dropped to 850 seconds using List approach.

Ashwin
  • 431
  • 1
  • 5
  • 11