2

Is using the FileStream class to write to a file and the .NET File.Copy method to copy the file at the same time thread safe? It seems like the operating system should safely handle concurrent access to the file, but I cannot find any documentation on this. I've written a simple application to test and am seeing weird results. The copy of the file is showing to be 2MB, but when I inspect the file content with notepad++ it's empty inside. The original file contains data.

using System;
using System.Threading.Tasks;
using System.Threading;
using System.IO;

namespace ConsoleApplication
{
    class Program
    {
        static void Main(string[] args)
        {
            string filePath = Environment.CurrentDirectory + @"\test.txt";
            using (FileStream fileStream = new FileStream(filePath, FileMode.Create, FileAccess.ReadWrite))
            {
                Task fileWriteTask = Task.Run(() =>
                    {
                        for (int i = 0; i < 10000000; i++)
                        {
                            fileStream.WriteByte((Byte)i);
                        }
                    });

                Thread.Sleep(50);
                File.Copy(filePath, filePath + ".copy", true);
                fileWriteTask.Wait();
            }
        }
    }
}

Thanks for the help!

r2_118
  • 640
  • 1
  • 9
  • 25
  • I not sure there's anything C#-related in this question - this seems to be more about how the Windows API handles shared read / write operations. – Baldrick Jun 24 '14 at 03:15
  • Side note: checking content of binary file with text editor is not the best check, make sure to open it in binary mode. – Alexei Levenkov Jun 24 '14 at 03:28
  • 2
    What is the larger problem you're trying to solve? It's pretty clear that the code you have isn't going to work. – Jim Mischel Jun 24 '14 at 13:53
  • Note that unless the copy of the file is on a different physical hard drive than the original, trying to write to two files at the time time will *dramatically reduce* that disks writing speed (as it needs to move the disk head around between the sector for each file) and as this process will almost certainly be IO bound, not CPU bound, the CPU concurrency gains you nothing. In short, you should see *dramatically improved speed* by *using only one thread* here. – Servy Jun 24 '14 at 16:12
  • @JimMischel The larger problem is that I've inherited some code that I need to refactor where the application is regularly writing data to a file. In the class managing these files it has a method to make a copy of a file. In one thread the application is still writing to it and in the other it makes a copy. The way it does this right now is that it disposes the object containing the FileStream (let's call it FileWriteClass), makes a copy of the file and then creates a new FileWriteClass. Disposing and recreating the FileWriteClass object is adding some complication that I want to eliminate. – r2_118 Jun 24 '14 at 17:51
  • @r2_118 look at the second answer in the question i linked to in my answer – AK_ Jun 24 '14 at 18:10
  • 1
    My suggestion would be to close this question and ask a new one, explaining in detail what you want to do. – Jim Mischel Jun 24 '14 at 18:13
  • @AK_ thanks for putting together all of the different options. I've marked it as the answer. – r2_118 Jun 24 '14 at 22:12

3 Answers3

2

It is thread-safe in a sense of "neither of C# object would be corrupted".

Result of operation will be more or less random (empty file, partial copy, access denied) and depends on sharing mode used to open file for each operation.

If carefully setup this can produce sensible results. I.e. flush file after each line and specify compatible share mode will allow to be reasonably sure that complete lines are copied.

Alexei Levenkov
  • 98,904
  • 14
  • 127
  • 179
  • 2
    You'll need some type of mediator in order to make that work. Even if you flush after every write, the read could read one block and then read the next before the next write. It would then see end of file, and quit. – Jim Mischel Jun 24 '14 at 03:28
  • @JimMischel - my suggestion was to get something that looks consumable (i.e log file) but indeed if one wants to get complete file way more code need to be written to allow parallel write and copy... I.e. one can look at source for one of variants for [tee](http://en.wikipedia.org/wiki/Tee_%28Unix%29) to see how cloning can be done at run time. – Alexei Levenkov Jun 24 '14 at 03:34
2

It depends.

Depends what do you mean when you say "thread safe".

First of all, look at this constructor:

public FileStream(string path, FileMode mode, FileAccess access, FileShare share )

notice the last parameter, it states what do you allow other threads and processes to do with the file. the default that applies to constructors that don't have it is FileShare.Read, which means you allow others to view the file as read-only. this is of course unwise if you are writing to it.

That's what you basically did, you opened a file for writing, while allowing others to read it , and "read" includes copying.

Also please note, that without this: fileWriteTask.Wait(); at the end of your code, your entire function isn't thread safe, because the FileStream might be closed before you even start writing.

Windows does make file access thread safe, but in a pretty non trivial manner. for example if you would have opened the file with FileShare.None, it would have crashed File.Copy and to the best of my knowledge there isn't an elegant way to do this with .Net. The general approach Windows uses to synchronize file access is called optimistic concurrency, meaning to assume your action is possible, and fail if it isn't.

this question discusses waiting for file lock in .Net

Sharing files between process is a common issue and one of the ways to do this , mostly for Inter-Process Comunication is memory mapped files and this is the MSDN documentation

If you are brave and willing to play around with WinAPI and Overlapped IO, If I remember correctly LockFileEx allows nice file locking...

Also, once there was a magical thing called Transactional NTFS but it has moved on in to the realm of Microsoft Deprecated Technologies

Community
  • 1
  • 1
AK_
  • 7,981
  • 7
  • 46
  • 78
0

The answer is no. You cannot in general operate on file system objects from different threads and achieve consistent or predictable results for the file contents.

Individual .NET Framework functions may or may not be thead-safe, but this is of little consequence. The timing and order in which data is read from, written to or copied between individual files on disk is essentially non-deterministic. By which I mean that if you do the same thing multiple times you will get different results, depending on factors outside your control such as machine load and disk layout.

The situation is made worse because the Windows API responsible for File.Copy is run on a system process and is only loosely synchronised with your program.

Bottom line is that if you want file level synchronisation you have no choice but to use file-level primitives to achieve it. That means things like open/close, flushing, locking. Finding combinations that work is non-trivial.

In general you are better off keeping all the operations on a file inside one thread, and synchronising access to that thread.


In answer to a comment, if you operate on a file by making it memory-mapped, the in-memory contents are not guaranteed to be consistent with the on-disk contents until the file is closed. The in-memory contents can be synchronised between processes or threads, but the on-disk contents cannot.

A named mutex locks as between processes, but does not guarantee anything as to consistency of file system objects.

File system locks are one of the ways I mentioned that could be used to ensure file system consistency, but in many situations there are still no guarantees. You are relying on the operating system to invalidate cached disk contents and flush to disk, and this is not guaranteed for all files at all times. For example, it may be necessary to use the FILE_FLAG_NO_BUFFERING, FILE_FLAG_OVERLAPPED and FILE_FLAG_WRITE_THROUGH flags, which may severely affect performance.

Anyone who thinks this is an easy problem with a simple one-size-fits-all solution has simply never tried to get it to work in practice.

david.pfx
  • 10,520
  • 3
  • 30
  • 63
  • -1 of course you can. it's routinely done with memory mapped files. you can lock files on windows, and if you need better synchronisation you can use a named mutex – AK_ Jun 24 '14 at 16:28
  • 1
    If you think that's an answer, I don't think you understand the question. See edit. – david.pfx Jun 25 '14 at 04:03