1

I have an application that we are developing using .NET 4.0 and EF 6.0. Premise of the program is quite simple. Watch a particular folder on the file system. As a new file gets dropped into this folder, look up information about this file in the SQL Server database (using EF), and then based on what is found, move the file to another folder on the file system. Once the file move is complete, go back to the DB and update the information about this file (Register File move).

These are large media files so it might take a while for each of them to move to the target location. Also, we might start this service with hundreds of these media files sitting in the source folder already that will need to be dispatched to the target location(s).

So to speed things up, I started out with using Task parallel library (async/await not available as this is .NET 4.0). For each file in the source folder, I look up info about it in the DB, determine which target folder it needs to move to, and then start a new task that begins to move the file…

LookupFileinfoinDB(filename)
{
  // use EF DB Context to look up file in DB
}

// start a new task to begin the file move
var moveFileTask = Task<bool>.Factory.StartNew(
                () =>
                    {
                        var success = false;

                        try
                        {
                         // the code to actually moves the file goes here…
                         .......
                         }
                      }

Now, once this task completes, I have to go back to the DB and update the info about the file. And that is where I am running into problems. (keep in mind that I might have several of these 'move file tasks'running in parallel and they will finish at different times. Currently, I am using task continuations to register the file move in the DB:

filemoveTask.ContinueWith(
                       t =>
                       {
                           if (t.IsCompleted && t.Result)
                           {
                             RegisterFileMoveinDB();
                           }
                       }

Problem is that I am using the same DB context for looking up the file info in the main task as well as inside the RegistetrFilemoveinDB() method later, that executes on the nested task. I was getting all kinds of weird exceptions thrown at me (mostly about SQL server Data reader etc.) when moving several files together. Online search for the answer revealed that the sharing of DB context among several tasks like I am doing here is a big no no as EF is not thread safe.

I would rather not create a new DB context for each file move as there could be dozens or even hundreds of them going at the same time. What would be a good alternative approach? Is there a way to 'signal' the main task when a nested task completes and finish the File move registration in the main task? Or am I approaching this problem in a wrong way all together and there is a better way to go about this?

Fike Rehman
  • 725
  • 2
  • 7
  • 16
  • I would just scope separate DbContext object inside each of RegisterFileMoveinDB and LookupFileinfoinDB. – David Browne - Microsoft Jun 26 '17 at 20:04
  • You are working with external resources (file system, database) - so `async-await` can be better for your case then "wasting" threads for IO operations. You can use `async-await` in .NET 4.0. [Using async/await without .NET Framework 4.5](https://blogs.msdn.microsoft.com/bclteam/2012/10/22/using-asyncawait-without-net-framework-4-5/) – Fabio Jun 26 '17 at 20:41
  • @Fabio - how are threads being wasted? What do you think happens when you call an awaitable `xyzAsync(...)` method? – Moho Jun 26 '17 at 21:20
  • @Moho, thread which executes IO operation do nothing - only waiting for response. `async-await` provide possibility execute asynchronous IO operation on one thread. Notice, that I talk about asynchronous IO operations. What happens when you call `await xyzAsync(...)` depend on how `xyzAsync` implemented. – Fabio Jun 27 '17 at 06:14
  • so, if you `await` an async IO operation, a thread is not allocated/created per IO - all IO operations are handled by the same thread? – Moho Jun 27 '17 at 06:22

3 Answers3

6

Your best bet is to scope your DbContext for each thread. Parallel.ForEach has overloads that are useful for this (the overloads with Func<TLocal> initLocal:

Parallel.ForEach( 
    fileNames, // the filenames IEnumerable<string> to be processed
    () => new YourDbContext(), // Func<TLocal> localInit
    ( fileName, parallelLoopState, dbContext ) => // body
    {
        // your logic goes here
        // LookUpFileInfoInDB( dbContext, fileName )
        // MoveFile( ... )
        // RegisterFileMoveInDB( dbContext, ... )

        // pass dbContext along to the next iteration
        return dbContext;
    }
    ( dbContext ) => // Action<TLocal> localFinally
    {
        dbContext.SaveChanges(); // single SaveChanges call for each thread
        dbContext.Dispose();
    } );

You can call SaveChanges() within the body expression/RegisterFileMoveInDB if you prefer to have the DB updated ASAP. I would suggest tying the file system operations in with the DB transaction so that if the DB update fails, the file system operations are rolled back.

Moho
  • 15,457
  • 1
  • 30
  • 31
  • Hi Moho, this scenario seems to be similar to wha I need... coudl you please help me with it? Here is the link: https://stackoverflow.com/questions/46333707/task-factory-starnew-and-applicationdbcontext-update – parismiguel Sep 21 '17 at 19:37
1

You could also pass the ExclusiveScheduler of a ConcurrentExclusiveSchedulerPair instance as a parameter of ContinueWith. This way the continuations will run sequentially instead of concurrently regarding to each other.

TaskScheduler exclusiveScheduler
    = new ConcurrentExclusiveSchedulerPair().ExclusiveScheduler;

//...

filemoveTask.ContinueWith(t => 
{
    if (t.Result)
    {
        RegisterFileMoveinDB();
    }
}, exclusiveScheduler);
Theodor Zoulias
  • 34,835
  • 7
  • 69
  • 104
0

According to @Moho question:

  1. Threads in i.e. built-in IO async operations are taken from threadpool of .NET runtime CLR so it's very efficient mechanism. If you create threads by your self you do it in old manner which is inefficient especially for IO operations.

  2. When you call async you don't have to wait immediately. Postpone waiting until it's necessary.

Best Regards.

Green
  • 2,405
  • 3
  • 22
  • 46