2

I am really really confused about Parallel.ForEach... How does it work?
The code below has an error -> File Is In Use

Parallel.ForEach(list_lines_acc, (line_acc, list_lines_acc_state) =>
{
     FileStream file = 
         new FileStream(GPLfilePath, FileMode.Open, FileAccess.ReadWrite);
     StreamReader reader = new StreamReader(file);
     var processed = string.Empty;
     Ok_ip_port = string.Empty;
     while (reader.EndOfStream)
     {
         if (string.IsNullOrEmpty(Ok_ip_port))
         {
             Ok_ip_port = reader.ReadLine();
         }
         else
         {
             string currentLine = reader.ReadLine();
             processed += currentLine + Environment.NewLine;
         }
     }
     StreamWriter writer = new StreamWriter(file);
     writer.Write(processed);

     reader.Close();
     writer.Close();
     file.Close();
});  

Would you please show me how can I fix that? This code is just an example.

I want to work with string arrays & Lists inside Parallel.ForEach, but there is always a problem for adding or editing those collections. Can you please provide an example? I am using Visual Studio 2010 + .NET Framework 4.0

Dave Zych
  • 21,581
  • 7
  • 51
  • 66
SilverLight
  • 19,668
  • 65
  • 192
  • 300

4 Answers4

7

In your code, as written, each thread is using the same file, and effectively trying to append to it. Even if this could work, you would have a bad race condition (as the threads would be trying to append to the same file simultaneously).

The error you're seeing is purely because you're using the same file in each loop iteration, so when you try to open the file (after the first iteration), it's erroring out as it's opened by a different loop iteration.

Also, you're never using your loop variable (line_acc), so there is really no need for a loop here at all. This could be written without the Parallel.ForEach, and you have the same result, with no issues.

That being said - if this is example code, you'll tend to find that loops that are bound purely by file I/O will tend to not parallelize well. The actual drive being used will become the limiting factor, so running code that purely reads and writes to a file in parallel will often cause the resulting code to run slower, not faster, than running it sequentially.

I want to work with string arrays & Lists inside Parallel.ForEach, but there is always a problem for adding or editing those collections

The code you're showing "as an example" is doing none of this, so it's difficult to see where your issue might be occurring. You can write to an array or List<T> by index, but you can't add to a list in a parallel loop without extra synchronization (such as a lock), as List<T> is not thread safe for writes. If you are trying to read and write from collections, you might consider looking at the System.Collections.Concurrent namespace, which contains thread safe collections you can safely use in Parallel.ForEach loops.

Reed Copsey
  • 554,122
  • 78
  • 1,158
  • 1,373
2

As noted in this question:

You are not synchronizing access to index and that means you have a race on it. That's why you have the error. For illustrative purposes, you can avoid the race and keep this particular design by using Interlocked.Increment.

private static void Func<T>(IEnumerable<T> docs)
{
    int index = -1;
    Parallel.ForEach(
        docs, doc =>
        {
            int nextIndex = Interlocked.Increment(index);
            CreateFolderAndCopyFile(nextIndex);
        }
    );
}

However, as others suggest, the alternative overload of ForEach that provides a loop index is clearly a cleaner solution to this particular problem.

But when you get it working you will find that copying files is IO bound rather than processor bound and I predict that the parallel code will be slower than the serial code.

Community
  • 1
  • 1
Codeman
  • 12,157
  • 10
  • 53
  • 91
  • This question makes no use of an index, even if it did; synchronizing access to it wouldn't help at all with "File in use" errors... – Peter Ritchie Sep 24 '12 at 18:27
  • @PeterRitchie the intention of the answer is to inform OP that an index is the proper way to avoid these issues. I would have suggested a mutex, but I don't know if a primitive is necessary to solve his problem. – Codeman Sep 24 '12 at 18:28
  • Index of what? There's no index there now, how would adding an index help at all? – Peter Ritchie Sep 24 '12 at 18:32
  • Adding an index to properly access each file in order would prevent deadlocking issues that OP is currently experiencing. Doing a `ForEach` implies that the object upon which the operation is occurring is IEnumerable, which is indexable. – Codeman Sep 24 '12 at 18:57
  • Is there some other question the OP has asked? This question is about "file in use", not dead locking. And enforcing order is the opposite of "Parallel". If order of operations is important (which this question does not detail), *not using* Parallel.For* would be a better approach... Nothing you detailed would do anything w.r.t order with the OP's code. – Peter Ritchie Sep 24 '12 at 19:09
1

Use a lock object around the problematic code.... Execution will wait for the lock to be released and you will never have multiple thread accessing the ressource.... parallel ForEach won't add performance in this case . Here is a simple example:

private Object fileLock = new Object();
private void WriteLog(string line)
{
    lock (fileLock)
    {
        string strNomLog = @".\MyFile.log";
        System.IO.File.AppendAllText(strNomLog, line);
    }
}
Guish
  • 4,968
  • 1
  • 37
  • 39
0

To get rid of a file in use error (assuming that it's in use because another thread is writing to it) you have to synchronize access to the file. This generally means that each parallel execution is waiting on other executions to finish writing and defeats the purpose of running in parallel.

Peter Ritchie
  • 35,463
  • 9
  • 80
  • 98