10

I'm trying to create a directory and copy a file (pdf) inside a Parallel.ForEach.

Below is a simple example:

    private static void CreateFolderAndCopyFile(int index)
    {
        const string sourcePdfPath = "c:\\testdata\\test.pdf";
        const string rootPath = "c:\\testdata";

        string folderDirName = string.Format("Data{0}", string.Format("{0:00000000}", index));

        string folderDirPath = rootPath + @"\" + folderDirName;

        Directory.CreateDirectory(folderDirPath);

        string desPdfPath = folderDirPath + @"\" + "test.pdf";

        File.Copy(sourcePdfPath, desPdfPath, true);

    }

The method above creates a new folder and copies the pdf file to a new folder. It creates this dir tree:

TESTDATA
  -Data00000000
      -test.pdf
  -Data00000001
      -test.pdf
....
  -Data0000000N
      -test.pdf

I tried calling the CreateFolderAndCopyFile method in a Parallel.ForEach loop.

    private static void Func<T>(IEnumerable<T> docs)
    {
        int index = 0;
        Parallel.ForEach(docs, doc =>
                                   {
                                       CreateFolderAndCopyFile(index);
                                       index++;
                                   });
    }

When I run this code it finishes with the following error:

The process cannot access the file 'c:\testdata\Data00001102\test.pdf' because it is being used by another process.

But first it created 1111 new folders and copied test.pdf about 1111 times before I got this error.

What caused this behaviour and how can it be resolved?

EDITED:

Code above was toy sample, sorry for hard coded strings Conclusion: Parallel method is slow.

Tomorrow I try some methods from How to write super-fast file-streaming code in C#?.

especially: http://designingefficientsoftware.wordpress.com/2011/03/03/efficient-file-io-from-csharp/

burnttoast11
  • 1,164
  • 16
  • 33
Mike
  • 441
  • 2
  • 5
  • 17
  • 8
    As an aside, you should probably use `Path.Combine` instead of concatenating the path yourself. – M.Babcock Mar 28 '12 at 18:10
  • 2
    @Mike I notice that you have not yet voted or accepted an answer on the site. May I suggest that you have a read of the [faq] to learn about these aspects of the Stack Overflow community. – David Heffernan Mar 28 '12 at 18:33

2 Answers2

18

You are not synchronizing access to index and that means you have a race on it. That's why you have the error. For illustrative purposes, you can avoid the race and keep this particular design by using Interlocked.Increment.

private static void Func<T>(IEnumerable<T> docs)
{
    int index = -1;
    Parallel.ForEach(
        docs, doc =>
        {
            int nextIndex = Interlocked.Increment(index);
            CreateFolderAndCopyFile(nextIndex);
        }
    );
}

However, as others suggest, the alternative overload of ForEach that provides a loop index is clearly a cleaner solution to this particular problem.

But when you get it working you will find that copying files is IO bound rather than processor bound and I predict that the parallel code will be slower than the serial code.

David Heffernan
  • 601,492
  • 42
  • 1,072
  • 1,490
  • This has to be the most overcomplicated foreach I have ever seen. Congratulations Parallels! – Nahydrin Mar 28 '12 at 18:13
  • 1
    @BrianGraham A 2 line ForEach is complicated? – Andrew T Finnell Mar 28 '12 at 18:17
  • 1
    @AndrewFinnell I presume Brian is referring to the `index` handling – David Heffernan Mar 28 '12 at 18:18
  • @AndrewFinnell `for (int i = 0; i < docs.Count; i++) CreateFolderAndCopyFile(i);` << alot less work than using parallels... – Nahydrin Mar 28 '12 at 18:18
  • @BrianGraham Fine, but that's serial. And presumably Mike believes that a parallel algo will speed up the file copying. – David Heffernan Mar 28 '12 at 18:20
  • @DavidHeffernan perhaps, but it looks sloppy to loop through a list and not even use what you loop through... – Nahydrin Mar 28 '12 at 18:22
  • @BrianGraham You mean not using `doc`. Agreed that is odd. My guess is that Mike is just experimenting to get a feel for performance. Note the made up file names. – David Heffernan Mar 28 '12 at 18:24
  • @sixlettervariables Thanks. I already up-voted your answer and gave a call out to that aspect in the penultimate paragraph in my answer. I've never used `Parallel`, and practically never use C# so I'm not familiar with exactly what's available in these libraries. Although I can spot an unsynchronized shared variable a mile off!! – David Heffernan Mar 28 '12 at 18:35
  • You are right, paralel version is slower. What you suggest for fast file copy? Do you have any idea? – Mike Mar 28 '12 at 19:55
  • 2
    You won't do better than system provided file copy routines. – David Heffernan Mar 28 '12 at 20:14
  • @all: Tomorrow I try some methods from http://stackoverflow.com/questions/955911/how-to-write-super-fast-file-streaming-code-in-c. Maybe from this blog: http://designingefficientsoftware.wordpress.com/2011/03/03/efficient-file-io-from-csharp/ – Mike Mar 28 '12 at 20:46
8

Your increment operation on index is suspect in that it is not thread safe. If you change the operation to Console.WriteLine("{0}", index++) you will see this behavior.

Instead you could use a Parallel.ForEach overload with a loop index:

private static void Func<T>(IEnumerable<T> docs)
{
    // nb: index is 'long' not 'int'
    Parallel.ForEach(docs, (doc, state, index) =>
                            {
                                CreateFolderAndCopyFile(index);
                            });
}
user7116
  • 63,008
  • 17
  • 141
  • 172