I have a .NET 4.0 program developed in Visual Studio 2012 that migrates data from a legacy system into a new system using a dataset for the legacy system data and Entity Framework for the new system. I have been having some serious memory consumption/memory leak issues with the program, particularly with large data sets, so I am using windbg to try to get to the bottom of the issue.
The way the program works basically is that it reads a large volume of DataRows from multiple legacy system access databases and stores rows of the same type in a List(of T)
. Next various procedures iterate through those lists and create new entities that get saved into the new system. After reading in all datarows into a given list using the GetData()
method of the TableAdapter
, I close the connection to the Access database. After parsing all items in the List(of T)
, I clear the list, and set it to Nothing. For saving to the new database via EntityFramework, I batch records up so that I am submitting changes for 100 records at a time, and after each submit I dispose, nullify and reinstantiate the EF ObjectContext
.
After doing GC.Collect()
, GC.WaitFullGCCollect()
and GC.Collect()
again, I took a memory dump through VisualStudio while debugging, after all records have been parsed and imported, when the process was showing upwards of 700mb memory being consumed in task manager, and then loaded that dump into windbg.
When I issue the !dumpheap -stat
command, I get a long list of objects and their number of instances and memory consumption. Of note, is at the bottom of this list, which shows the objects that are using the most memory. Here is a snippet of the bottom of that list:
67ac8034 28 1233489 System.Boolean[]
025b0fd4 24829 1787688 Importer.dsIBET+tblTKNumberRow
67ad3a70 2840 16078424 System.Int32[]
67ac0f78 4 18874416 System.Double[]
67adfcd8 18 19136728 System.Decimal[]
67add67c 33 19657100 System.DateTime[]
101a04cc 508 25441232 System.Data.RBTree`1+Node[[System.Int32, mscorlib]][]
101a0b70 509 25442268 System.Data.RBTree`1+Node[[System.Data.DataRow, System.Data]][]
025b19b4 761228 54808416 Importer.dsIBET+tblLogsDataRow
005244a8 390 85776048 Free
67abfe8c 16421 90050676 System.Object[]
67ad224c 9211552 294563252 System.String
As you can see there are a TON of objects still resident in memory that I would expect to not be present. In particular the tblLogsDataRow objects, and the System.String objects.
Here is where my knowledge of what to do enxt starts to get a little hazy. I have performed a !dumpstat -MT
on some of these, for example the system.string, and then randomly pick one of the over 9 million string instances and do a !gcroot on it, and from that I can discern that the strings seem to be related to my entities, which at this point there should be none since at this point in the call stack, I have disposed of and nullified my object context. So I am unclear why all of these strings are resident in memory, or where exactly they are.
As far as the datarow objects, I also don't understand how there are any of these present because I store them in a list, then later clear that list and nullify it.
Clearly I'm not doing something right, but I'm a little lost as to how to figure that out.
So my question is two part:
1) Can you provide any pointers on how to use windbg to get any further information that might help me diagnose the source of the memory leaks?
2) Can you provide any insights into what I'm doing wrong generally as far as the dataset and entity framework go? I.e. something about the way I am storing the datarows in lists and then clearing those lists is not freeing the memory used by those lists, and disposing of my objectcontext does not appear to be releasing any of the resources that it utilizes.
Thanks, Josh