1

I have a c#(.Net 3.5) application which imports thousands of files. Right now, I create background worker for each file. It works good up to certain limit & then application dies out with System out of memory exception. I am assuming this is happening because of large number of threads. Is threadpool a good solution for this situation?

Exception is :

    System.OutOfMemoryException | Exception of type 'System.OutOfMemoryException' was thrown. 
    at System.Data.RBTree`1.TreePage..ctor(Int32 size)
    at System.Data.RBTree`1.AllocPage(Int32 size)
    at System.Data.RBTree`1.InitTree()
    at System.Data.Index.InitRecords(IFilter filter)
    at System.Data.Index..ctor(DataTable table, Int32[] ndexDesc, IndexField[] indexFields,           
    Comparison`1 comparison, DataViewRowState recordStates, IFilter rowFilter)
    at System.Data.DataTable.GetIndex(IndexField[] indexDesc, DataViewRowState recordStates, IFilter 
    rowFilter)
    at System.Data.DataColumn.get_SortIndex()
    at System.Data.DataColumn.IsNotAllowDBNullViolated()
    at System.Data.DataTable.EnableConstraints()
    at System.Data.DataTable.set_EnforceConstraints(Boolean value)
    at System.Data.DataTable.EndLoadData()
    at System.Data.Common.DataAdapter.FillFromReader(DataSet dataset, DataTable datatable, String    
    srcTable, DataReaderContainer dataReader, Int32 startRecord, Int32 maxRecords, DataColumn    
    parentChapterColumn, Object parentChapterValue)
    at System.Data.Common.DataAdapter.Fill(DataTable[] dataTables, IDataReader dataReader, Int32 
    startRecord, Int32 maxRecords)
    at System.Data.Common.DbDataAdapter.FillInternal(DataSet dataset, DataTable[] datatables, Int32 
    startRecord, Int32 maxRecords, String srcTable, IDbCommand command, CommandBehavior behavior)
    at System.Data.Common.DbDataAdapter.Fill(DataTable[] dataTables, Int32 startRecord, Int32  
    maxRecords, IDbCommand command, CommandBehavior behavior)
    at System.Data.Common.DbDataAdapter.Fill(DataTable dataTable)
    at Dms.Data.Adapters.DataTableAdapterBase`2.FillByCommand(TTbl table, DbCommand command)
Rik
  • 21
  • 2
  • 5
  • Updated the question. Its in c #. Thx. – Rik Jan 26 '12 at 18:22
  • If your application is 32 bit and you load more than 1.5 GB of data in memory, you're probably going to get an out of memory exception. See this question for more info: http://stackoverflow.com/questions/1109558/allocating-more-than-1-000-mb-of-memory-in-32-bit-net-process So are you loading more than 1.5 GB of data at once? – Kiril Jan 26 '12 at 18:25
  • Are you importing the data from the thousands of files into some internal collection in the application OR are you putting the data in a database or something like that? – pstrjds Jan 26 '12 at 18:29
  • I don't know what version of .net you are using, but it may help to take a look at this: http://www.codeproject.com/Articles/12551/Sending-Files-in-Chunks-with-MTOM-Web-Services-and – Alex Mendez Jan 26 '12 at 18:41
  • I am importing files & putting them in database. – Rik Jan 26 '12 at 18:53

4 Answers4

4

The problem is most likely that you're trying to load too many files at once time.

Using a ThreadPool may help, as it could give you a means of limiting the processing. However, if you're importing and processing "thousands of files", the appropriate means may be to create a pipeline to handle your processing, and then fill the pipeline (or a certain number of them) with your files. This would let you control the amount of concurrency, and prevent too many individual files from being processed at the same time. It could keep your memory and processing requirements to a more reasonable level.


Edit:

Since you (now) mentioned that you're using C#... The BackgroundWorker actually does use the ThreadPool. Switching to using the thread pool directly may still be a good idea, but it likely won't solve the issue entirely. You may want to consider using something like BlockingCollection<T> to set up a producer/consumer queue. You could then have 1 or more threads "consume" the files and process them, and just add all of the files to the BlockingCollection<T>. This would give you control over how many files are handled at once (just add another thread for processing as you can).

Reed Copsey
  • 554,122
  • 78
  • 1,158
  • 1,373
  • +1 for BlockingCollection recommendation. A much smarter choice for the user (if he is describing everything correctly) – Bryan Crosby Jan 26 '12 at 18:53
1

It can be, yes. Consider that there are only a finite number of CPUs, or cores. Only that many threads can be running concurrently. You could have more active, say if many of them will be waiting on some other process running on a different computer (like if you're downloading these files). Just because you have a separate thread, doesn't mean it's adding concurrency. Just switching costs, and memory allocation (as you've seen). Depending on the amount of idle time, try limiting your pool to just slightly more threads than cpus. Tweak from there.

joe
  • 758
  • 1
  • 10
  • 16
  • When i import files, it think it really needs to wait for any other process. User enters the the txt file name which lists all the files with its location, application parses this file line by line & each file is read from specified location & copied in to application databse. Does this clarify my scenario? – Rik Jan 26 '12 at 19:04
  • My answer was generic. I didn't know what language or framework you were using. I am not familiar with C#. Thread models are language agnostic, but as shown by the other responders, it's nice to be "talking" the same language to communicate design. That said, there may not be that much difference in pooling vs pipeline. Boss/worker is effectively pooling. Either way, you're funnelling many separate tasks through a finite resource. Still speaking generically (for what it's worth), you may find the model isn't as important as how it fits with the structure of your application. – joe Jan 27 '12 at 00:59
1

I think it is a good choice. However, background worker has been superseded somewhat by .Net 4 framworks tasks. This optimises based on the number of processors on your machine and dishes the work out accordingly. Perhaps you could use the TPL and use a parallel for. You can pass in the max number of concurrent thread pool threads to run in order to limit how many files you import at once, in batches, e.g.:

ParallelOptions options = new ParallelOptions();  
options.MaxDegreeOfParallelism = 4;

This might help you?

Community
  • 1
  • 1
Jeb
  • 3,689
  • 5
  • 28
  • 45
0

If I have understood you right, you need to implement producer-consumer approach: 1) one producer - produces file list (to be imported). 2) several (fixed number) consumers - perform import.

To achieve this, you would use BlockingCollection (since .NET 4.0). There's an example in the documentation.

  • BlockingCollection is available in .net 4.0. I am working on .net 3.5. But if I limit the number of background workes running concurrently, will it help? – Rik Jan 26 '12 at 20:48
  • No problem at all, please take a look at the blocking queue class implementations here: http://stackoverflow.com/questions/530211/creating-a-blocking-queuet-in-net – Sergey Vyacheslavovich Brunov Jan 26 '12 at 20:53