1

I'm using a datatable to hold a running last 1000 log messages in FIFO methodology. I add items into the datatable and remove first in row after the size grows to 1000 items. However, while the datatable doesn't exceed 1000 items the memory drops over time.

Sample:

 DataTable dtLog = new DataTable();
 for (int nLoop = 0; nLoop < 10000; nLoop++)
 {
    oLog LogType = new LogType();
    oLog.Name = "Message number " + nLoop;

    dtLog.Rows.Add( oLog);
    if (dtLog.Rows.Count  > 1000)
       dtLog.Rows.RemoveAt(0);
 }    

So the messages are removed from the datatable, but the memory doesn't seem to get released. I would expect the memory to be released...?

Or perhaps there's a better way to do a running log using something other than datatables?

Zonus
  • 2,313
  • 2
  • 26
  • 48
  • Measuring used memory is very tricky in a managed environment where the garbage collector doesn't work in synch with your code. – Steve May 30 '17 at 16:24
  • 2
    It sounds like you want a Queue? https://msdn.microsoft.com/en-us/library/7977ey2c(v=vs.110).aspx –  May 30 '17 at 16:24
  • you need a reverse counter to handle this properly.. take a look at this simple example located here https://stackoverflow.com/questions/5648339/deleting-specific-rows-from-datatable – MethodMan May 30 '17 at 16:29
  • Do you mean to say the memory keeps increasing even though the datatable doesn't? If not I would expect the memory to stabilise when you get to 1000 records - as you are creating a record & then deleting one. – PaulF May 30 '17 at 16:29
  • 2
    Deleting a DataRow merely changes the RowState property to DataRowState.Deleted, it does not remove the row from the table. So sure, memory usage *should* increase. A workaround would be to call AcceptChanges() once in a while. – Hans Passant May 30 '17 at 16:53
  • @HansPassant: Although the DataRowState is set the DataRow objects are later removed from the internal array and are now longer rooted. See my measurements below. The removal is done in System.Data.DataTable.SetNewRecordWorker. – Alois Kraus May 30 '17 at 21:01

3 Answers3

0

I can't speak to the memory leak part of your question as the Memory Management and Garbage Collection in .net makes that a hard thing to investigate.

But, what I can do is suggest that unless you have to, you should never use DataTables in .Net.

Now, "never" is a pretty strong claim! That sort of thing needs backing up with good reasons.

So,. what are those reasons? ... memory usage.

I created this .net fiddle: https://dotnetfiddle.net/wOtjw1

using System;
using System.Collections.Generic;
using System.Xml;
using System.Data;

public class DataObject
{
    public string Name { get; set; }
}

public class Program
{
    public static void Main()
    {
        Queue();
    }

    public static void DataTable()
    {
        var dataTable = new DataTable();
        dataTable.Columns.Add("Name", typeof(string));

        for (int nLoop = 0; nLoop < 10000; nLoop++)
        {
            var dataObject = new DataObject();
            dataObject.Name = "Message number " + nLoop;

            dataTable.Rows.Add(dataObject);

            if (dataTable.Rows.Count > 1000)
                dataTable.Rows.RemoveAt(0);
        }   
    }

    public static void Queue()
    {
        var queue = new Queue<DataObject>();

        for (int nLoop = 0; nLoop < 10000; nLoop++)
        {
            var dataObject = new DataObject();
            dataObject.Name = "Message number " + nLoop;

            queue.Enqueue(dataObject);

            if (queue.Count > 1000)
                queue.Dequeue();
        }   
    }
}

Run it twice, once with the DataTable method, once with the Queue method.

Look at the memory usage .net fiddle reports each time:

DataTable Memory: 2.74Mb

Queue Memory: 1.46Mb

It's almost half the memory usage! And all we did was stop using DataTables.

.Net DataTables are notoriously memory hungry. They have fairly good reasons for that, they can store lots of complex schema information and can track changes etc.

That's great, but ... do you need those features?

No? Dump the DT, use something under System.Collections(.Generic).

  • I looked at the Queue object however, I had a hard time binding it to a datagridview... This is to view activity on a background service that is running. This is why I didn't use it earlier. I'm open to using a queue if I can bind it to a view. – Zonus May 30 '17 at 19:13
  • That sounds like something you should post as a new question. –  May 31 '17 at 07:44
-1

Whenever you modify/delete a row from DataTable the old/deleted data is still kept by the DataTable until you call DataTable.AcceptChanges

When AcceptChanges is called, any DataRow object still in edit mode successfully ends its edits. The DataRowState also changes: all Added and Modified rows become Unchanged, and Deleted rows are removed.

There is no memory leak because that is as designed.

As an alternative you can use a circular buffer which would fit better than a queue.

Sir Rufo
  • 18,395
  • 2
  • 39
  • 73
  • Thanks, I'll dig into the circular buffer! – Zonus May 30 '17 at 19:14
  • The circular buffer advice is good but the DataTable rows are no longer rooted when they are removed. See my answer below how to measure the actual memory footprint. After removal of the DataTable the DataRow objects are no longer rooted. – Alois Kraus May 30 '17 at 21:02
-1

Your memory is released but it is not so easy to see. There is a lack of tools (except Windbg with SOS) to show the currently allocated memory minus dead objects. Windbg has for this the !DumpHeap -live option to display only live objects.

I have tried the fiddle from AndyJ https://dotnetfiddle.net/wOtjw1

First I needed to create a memory dump with DataTable to have a stable baseline. MemAnalyzer https://github.com/Alois-xx/MemAnalyzer is the right tool for that.

MemAnalyzer.exe -procdump -ma DataTableMemoryLeak.exe DataTable.dmp

This expects procdump from SysInternals in your path.

Now you can run the program with the queue implementation and compare the allocation metrics on the managed heap:

C>MemAnalyzer.exe -f DataTable.dmp -pid2 20792 -dtn 3
        Delta(Bytes)    Delta(Instances)        Instances       Instances2      Allocated(Bytes)        Allocated2(Bytes)       AvgSize(Bytes)  AvgSize2(Bytes) Type
        -176,624        -10,008                 10,014          6               194,232                 17,608                  19              2934            System.Object[]
        -680,000        -10,000                 10,000          0               680,000                 0                       68                              System.Data.DataRow
        -7,514          -88                     20,273          20,185          749,040                 741,526                 36              36              System.String
        -918,294        -20,392                 60,734          40,342          1,932,650               1,014,356                                               Managed Heap(Allocated)!
        -917,472        0                       0               0               1,954,980               1,037,508                                               Managed Heap(TotalSize)

This shows that we have 917KB more memory allocated with the DataTable approach and that 10K DataRow instances are still floating around on the managed heap. But are these numbers correct?

No.

Because most objects are already dead but no full GC did happen before we did take a memory dump these objects are still reported as alive. The fix is to tell MemAnalyzer to consider only rooted (live) objects like Windbg does it with the -live option:

    C>MemAnalyzer.exe -f DataTable.dmp -pid2 20792 -dts 5 -live
Delta(Bytes)    Delta(Instances)        Instances       Instances2      Allocated(Bytes)        Allocated2(Bytes)       AvgSize(Bytes)  AvgSize2(Bytes) Type
-68,000         -1,000                  1,000           0               68,000                  0                       68                              System.Data.DataRow
-36,960         -8                      8               0               36,960                  0                       4620                            System.Data.RBTree+Node<System.Data.DataRow>[]
-16,564         -5                      10              5               34,140                  17,576                  3414            3515            System.Object[]
-4,120          -2                      2               0               4,120                   0                       2060                            System.Data.DataRow[]
-4,104          -1                      19              18              4,716                   612                     248             34              System.String[]
-141,056        -1,285                  1,576           291             169,898                 28,842                                                  Managed Heap(Allocated)!
-917,472        0                       0               0               1,954,980               1,037,508                                               Managed Heap(TotalSize)

The DataTable approach still needs 141,056 bytes more memory because of the extra DataRow, object[] and System.Data.RBTree+Node[] instances. Measuring only the Working set is not enough because the managed heap is lazy deallocated. The GC can keep large amounts of memory if it thinks that the next memory spike is not far away. Measuring committed memory is therefore a nearly meaningless metric except if your (very low hanging) goal is to fix only memory leaks of GB in size.

The correct way to measure things is to measure the sum of

  • Unmanaged Heap
  • Allocated Managed Heap
  • Memory Mapped Files
  • Page File baked Memory Mapped File (Shareable Memory)
  • Private Bytes

This is actually exactly what MemAnalyzer does with the -vmmap switch which expexct vmmap from Sysinternals in its path.

MemAnalyzer -pid ddd -vmmap 

This way you can also track unmanaged memory leaks or file mapping leaks as well. The return value of MemAnalyzer is the total allocated memory in KB.

  • If -vmmap is used it will report the sum of the above points.
  • If vmmap is not present it will only report the allocated managed heap.
  • If -live is added then only rooted managed objects are reported.

I did write the tool because there are no tools out there to my knowledge which make it easy to look at memory leaks in a holistic way. I always want to know if I leak memory regardless if it is managed, unmanaged or something else.

enter image description here

By writing the diff output to a CSV file you can create easily Pivot diff charts like the one above.

MemAnalyzer.exe -f DataTable.dmp -pid2 20792  -live -o ExcelDiff.csv

That should give you some ideas how to track allocation metrics in a much more accurate way.

Alois Kraus
  • 13,229
  • 1
  • 38
  • 64
  • If I let this run for a few days, I can easily eat out a GB of ram... I'm on VS2017 so it has a little more memory profiling tools than previous versions. – Zonus May 30 '17 at 21:50
  • Managed Memory Analysis for memory dumps is only part of the Ultimate Edition. Yes VS has become better but it is still not good. PerfView is much better but you need to be careful of its probabilistic sampling approach which can skew numbers a lot. If you compare the numberf of VS and MemAnalyzer you will find that VS works like MemAnalyzer without the -live switch which makes it hard to check if you have now more or less memore allocated if you try to optimize things. Sure you can trigger GCs but that is still a pretty manual approach. – Alois Kraus May 30 '17 at 22:30