3

A scenario where I have 10,000 XML files that I want to read in and save to a database. What i have is 5 Windows Services that are all hitting the folder to try and process.

My technique is to first try and rename (File.Move) the file with an extension that is specific to the given Service Instance.

This is working 99% of the time. However what I am seeing is the file system will .01% of the time allow two requests to try and rename at EXACTLY the same time.

How can I prevent this? Does this make sense? See the following code snippet to get an idea. I end up with about 10 files that are IO Exceptions.

 string[] sourceFiles = Directory.GetFiles(InputPath, string.Format(LocaleHelper.Culture, "*.{0}", Extention))
                                            .OrderBy(d => new FileInfo(d).CreationTime).ToArray();


            foreach (string file in sourceFiles)
            {
                var newFileName = string.Format(LocaleHelper.Culture, "{0}.{1}", file, CacheFlushManager.GetInstanceName);


                try
                {
                    // first we'll rename // however at this point the file may not even exist
                    // it will throw an exception and move onto the next file if it exists


                    File.Move(file, newFileName);



                    var xml = File.ReadAllText(newFileName);

                    // write to DB at this point we know its unique
                }
                catch (FileNotFoundException ex)
                {
                    Logger.LogDebug(string.Format(LocaleHelper.Culture, "{0} Couldn't read file : {1}", CacheFlushManager.GetInstanceName, newFileName));
                }
                catch (IOException ex)
                {
                    Logger.LogDebug(string.Format(LocaleHelper.Culture, "{0} Couldn't process file : {1}", CacheFlushManager.GetInstanceName, newFileName));
                }
                catch (Exception ex)
                {
                    Logger.LogError("Execute: Error", ex);

                    try
                    {
                        File.Move(newFileName, string.Format(LocaleHelper.Culture, "{0}.badfile", newFileName));
                    }
                    catch (Exception ex_deep)
                    {
                        Logger.LogError(string.Format("{0} Execute: Error Deep could not move bad file {1}", CacheFlushManager.GetInstanceName, newFileName), ex_deep);
                    }
                }

EDIT 1

Below is the exact error as an example of what I'm seeing. I'm very confused on how the file act that exact time could be in used based on the code I'm using? Am I completely out in the weeds with this?

[7220] TransactionFileServiceProcess [11:28:32]: Service4 Couldn't process file : C:\temp\Input\yap804.xml.Service4 System.IO.IOException: The process cannot access the file 'C:\temp\Input\yap804.xml.Service4' because it is being used by another process.

EDIT 2

Here is a look at what is going on from a "debug" perspective. How could both Service's 2 & 3 get to "END RENAME?" I think this is the crux of the issue... thoughts?

The problem is at the file yap620.xml.Service3 ultimately will just sit out there because of the File operation error.

[6708] TransactionFileServiceProcess [10:54:38]: Service3 Start Rename: C:\temp\Input\yap620.xml.Service3 TransactionFileServiceProcess.Execute => BHSLogger.LogDebug =>     LoggerImpl.Write E[]

[4956] TransactionFileServiceProcess [10:54:38]: Service2 Start Rename: C:\temp\Input\yap620.xml.Service2 TransactionFileServiceProcess.Execute => BHSLogger.LogDebug => LoggerImpl.Write E[]

[7416] TransactionFileServiceProcess [10:54:38]: Service4 Start Rename: C:\temp\Input\yap620.xml.Service4 TransactionFileServiceProcess.Execute => BHSLogger.LogDebug => LoggerImpl.Write E[]

[6708] TransactionFileServiceProcess [10:54:38]: Service3 End Rename: C:\temp\Input\yap620.xml.Service3 TransactionFileServiceProcess.Execute => BHSLogger.LogDebug => LoggerImpl.Write E[]

[6708] TransactionFileServiceProcess [10:54:38]: Service3 Start Read: C:\temp\Input\yap620.xml.Service3 TransactionFileServiceProcess.Execute => BHSLogger.LogDebug => LoggerImpl.Write E[]

[4956] TransactionFileServiceProcess [10:54:38]: Service2 End Rename: C:\temp\Input\yap620.xml.Service2 TransactionFileServiceProcess.Execute => BHSLogger.LogDebug => LoggerImpl.Write E[]

[4956] TransactionFileServiceProcess [10:54:38]: Service2 Start Read: C:\temp\Input\yap620.xml.Service2 TransactionFileServiceProcess.Execute => BHSLogger.LogDebug => LoggerImpl.Write E[]

[6708] TransactionFileServiceProcess [10:54:38]: Service3 Couldn't process file : C:\temp \Input\yap620.xml.Service3 TransactionFileServiceProcess.Execute => BHSLogger.LogDebug => LoggerImpl.Write E[]
aherrick
  • 19,799
  • 33
  • 112
  • 188
  • 10,000 XML files? why not use JSON? – thenewseattle Nov 15 '13 at 18:12
  • I don't have a choice over what file format they are in unfortunately. And in this scenario it wouldn't matter. – aherrick Nov 15 '13 at 18:14
  • Trying to eliminate collisions does not make sense - you seem to already have enough code to avoid it - so just expect collisions (or any other IO errors) to happen and retry. – Alexei Levenkov Nov 15 '13 at 18:15
  • You mean you have five services that do exactly this at the same time on the same folder? – CodeCaster Nov 15 '13 at 18:26
  • CodeCaster yes exactly. Think of it like load balance. Each Service would exist on its own server all pointed to 1 directory – aherrick Nov 15 '13 at 18:26
  • "This is working 99% of the time. However what I am seeing is the file system will .01% of the time allow two requests to try and rename at EXACTLY the same time" What happens in that case? One should succeed, the other fail. – usr Nov 15 '13 at 18:50
  • 1
    Are you saying that the `File.Move` method succeeds for two different processes trying to rename the same file? That seems unlikely. – Jim Mischel Nov 15 '13 at 22:40
  • http://msdn.microsoft.com/en-us/library/windows/hardware/ff540344(v=vs.85).aspx specifies, that renamed can only succeed if no other handle to the file exists. So according to the docs, renames are atomic with respect to one another. This is also an important guarantee that I expect from a file system. – usr Nov 16 '13 at 16:59
  • This is the exact error I'm seeing captured with debug view and logger. Again with 10k files it happens 5-10 times. [7220] TransactionFileServiceProcess [11:28:32]: Service4 Couldn't process file : C:\temp\Input\yap804.xml.Service4 System.IO.IOException: The process cannot access the file 'C:\temp\Input\yap804.xml.Service4' because it is being used by another process. – aherrick Nov 18 '13 at 16:30
  • @aherrick, did you managed to resolve the issue? Seems I've faced the same one: http://stackoverflow.com/questions/31303474/concurrent-file-move-of-the-same-file – Eugene D. Gubenkov Jul 08 '15 at 21:41

2 Answers2

1

I don't see where the issue is. You have multiple threads that get a list of files, and then try to process those files. Sometimes the file that the thread is trying to rename doesn't exist, and sometimes the file exists but it is in the process of being renamed by another thread. Neither one of those two should be a problem. In either case, the thread that gets the error should just assume that some other thread is processing the file, and move on.

Assuming, of course, that you don't have some other process that's accessing files in that directory.

Why you'd want five separate service instances doing this is beyond me. You could simplify things quite a bit and cut down on unnecessary I/O by having just one process do a Parallel.ForEach. For example:

string[] sourceFiles = Directory.GetFiles(
    InputPath,
    string.Format(LocaleHelper.Culture, "*.{0}", Extention))
    .OrderBy(d => new FileInfo(d).CreationTime).ToArray();

Parallel.Foreach(sourceFiles, (file) =>
{
    // do file processing here
});

The TPL will allocate multiple threads to do the processing, and assign work items to the threads. So there's no chance that a file will be open by multiple threads.

Jim Mischel
  • 131,090
  • 20
  • 188
  • 351
  • 1
    Hey Jim thanks for the answer. To answer your question for a load balancing scenario. Let's say they had 5 different servers on a domain. Each server had a service running all pointing to one folder share. – aherrick Nov 22 '13 at 20:34
0

Do you have multiple threads running in the same service? Or multiple independent services?

If you have multiple threads in the same service just create a Queue<FileInfo> or something similar and remove items from the queue when threads are free to process. I believe the standard Queue is thread safe so you should never be processing the same file twice.

If you have multiple independent services you could look at using LockFile or File.Open with FileShare.None specified.

edit:

I misunderstood what you were trying to do. I thought you wanted all files to be processed by each of the services. You really need to run these a multiple threads in the same service or allow some method of communication that allows the different services to ascertain which of the files have already been processed.

FlyingStreudel
  • 4,434
  • 4
  • 33
  • 55
  • Hey dude.. so multiple independent services each with 1 thread. How would I use LockFile to rename? – aherrick Nov 15 '13 at 18:26
  • You wouldn't rename, you would just acquire a lock and any service that tried to access the file at the same time would either wait to acquire a lock or make a note of that file and retry at a later time (or don't). – FlyingStreudel Nov 15 '13 at 18:27
  • but what I want is if there are 10 files and 5 service processes, each process to essentially process 2. – aherrick Nov 15 '13 at 18:35
  • Why cant you do this all from one service then? – FlyingStreudel Nov 15 '13 at 18:38