4

Within a tool copying big files between disks, I replaced the System.IO.FileInfo.CopyTo method by System.IO.Stream.CopyToAsync. This allow a faster copy and a better control during the copy, e.g. I can stop the copy. But this create even more fragmentation of the copied files. It is especially annoying when I copy file of many hundreds megabytes.

How can I avoid disk fragmentation during copy?

With the xcopy command, the /j switch copies files without buffering. And it is recommended for very large file in TechNet It seems indeed to avoid file fragmentation (while a simple file copy within windows 10 explorer DOES fragment my file!)

A copy without buffering seems to be the opposite way than this async copy. Or it there any way to do async copy without buffering?

Here it my current code for aync copy. I let the default buffersize of 81920 bytes, i.e. 10*1024*size(int64).

I am working with NTFS file systems, thus 4096 bytes clusters.

EDIT: I updated the code with SetLength as suggested, added the FileOptions Async while creating the destinationStream and fix setting the attributes AFTER setting the time (otherwise, exception is thrown for ReadOnly files)

        int bufferSize = 81920; 
        try
        {
            using (FileStream sourceStream = source.OpenRead())
            {
                // Remove existing file first
                if (File.Exists(destinationFullPath))
                    File.Delete(destinationFullPath);

                using (FileStream destinationStream = File.Create(destinationFullPath, bufferSize, FileOptions.Asynchronous))
                {
                    try
                    {                             
                        destinationStream.SetLength(sourceStream.Length); // avoid file fragmentation!
                        await sourceStream.CopyToAsync(destinationStream, bufferSize, cancellationToken);
                    }
                    catch (OperationCanceledException)
                    {
                        operationCanceled = true;
                    }
                } // properly disposed after the catch
            }
        }
        catch (IOException e)
        {
            actionOnException(e, "error copying " + source.FullName);
        }

        if (operationCanceled)
        {
            // Remove the partially written file
            if (File.Exists(destinationFullPath))
                File.Delete(destinationFullPath);
        }
        else
        {
            // Copy meta data (attributes and time) from source once the copy is finished
            File.SetCreationTimeUtc(destinationFullPath, source.CreationTimeUtc);
            File.SetLastWriteTimeUtc(destinationFullPath, source.LastWriteTimeUtc);
            File.SetAttributes(destinationFullPath, source.Attributes); // after set time if ReadOnly!
        }

I fear also that the File.SetAttributes and Time at the end on my code could increase file fragmentation.

Is there a proper way to create a 1:1 asynchronous file copy without any file fragmentation, i.e. asking the HDD that the file steam get only contiguous sectors?

Other topics regarding file fragmentation like How can I limit file fragmentation while working with .NET suggests incrementing the file size in larger chunks, but it does not seem to be a direct answer to my question.

Community
  • 1
  • 1
EricBDev
  • 1,279
  • 13
  • 21
  • Have you tried `destinationStream.Length = sourceStream.Length;` just before the copy? – Lucas Trzesniewski Jan 05 '17 at 20:58
  • Good idea, Length is a getter only, but the SetLength method does the job. It seems indeed to avoid fragmentation on a quick test! I see also the FileOptions when I create the destinationStream. Wonder if Asynchronous or WriteThrough could be a good option – EricBDev Jan 05 '17 at 21:08

3 Answers3

4

but the SetLength method does the job

It does not do the job. It only updates the file size in the directory entry, it does not allocate any clusters. The easiest way to see this for yourself is by doing this on a very large file, say 100 gigabytes. Note how the call completes instantly. Only way it can be instant is when the file system does not also do the job of allocating and writing the clusters. Reading from the file is actually possible, even though the file contains no actual data, the file system simply returns binary zeros.

This will also mislead any utility that reports fragmentation. Since the file has no clusters, there can be no fragmentation. So it only looks like you solved your problem.

The only thing you can do to force the clusters to be allocated is to actually write to the file. It is in fact possible to allocate 100 gigabytes worth of clusters with a single write. You must use Seek() to position to Length-1, then write a single byte with Write(). This will take a while on a very large file, it is in effect no longer async.

The odds that it will reduce fragmentation are not great. You merely reduced the risk somewhat that the writes will be interleaved by writes from other processes. Somewhat, actual writing is done lazily by the file system cache. Core issue is that the volume was fragmented before you began writing, it will never be less fragmented after you're done.

Best thing to do is to just not fret about it. Defragging is automatic on Windows these days, has been since Vista. Maybe you want to play with the scheduling, maybe you want to ask more about it at superuser.com

Hans Passant
  • 922,412
  • 146
  • 1,693
  • 2,536
  • "This will also mislead any utility that reports fragmentation. Since the file has no clusters, there can be no fragmentation" But the file get written eventually. Just did a test again with a 4 GB file, occupying 16k clusters: all appears to be contiguous in the ClusterView of the Defrag Tool. – EricBDev Jan 07 '17 at 00:21
  • please see my corresponding answer it is what you meant. As written, it seems as 'instant' as SetLengh() and does not seem to create a performance penalty. But it does not guarantee neither that all cluster get contiguous. I just tested to copy a 60 GB file on a partition having only 90 GB available. The 60 GB got copied, but in 3 fragments, since my disk did NOT have free contiguous 60 GB ! (some clusters occupied in the middle) – EricBDev Jan 07 '17 at 01:45
  • as commented in my answer above, the seek+write strategy DID better job than SetLength with a 100 GB VM copied: one piece with seek+write whereas 3 fragments with SetLength()! – EricBDev Jan 10 '17 at 14:05
3

I think, FileStream.SetLength is what you need.

Yury Glushkov
  • 711
  • 6
  • 16
  • 1
    I came also to that solution with Lucas comment. It reduce the fragmentation a lot. However, not completely, I still have a few files fragmented after the copy. Not a big deal compared to the previous state, but wonder if I could do even better. Can we guarantee no fragmentation?? – EricBDev Jan 06 '17 at 09:34
  • 1
    You can only guarantee that when you format the disk before each copy operation. – H H Jan 06 '17 at 10:24
  • @HenkHolterman you are right, but on the other hand, it's possible to reduce fragmentation in case of multiple parallel writes – Yury Glushkov Jan 06 '17 at 11:34
  • @HansPassant but in provided case, it will happen on first write in async statement – Yury Glushkov Jan 06 '17 at 14:57
  • @HansPassant oh, it's very interesting! Thank you! – Yury Glushkov Jan 06 '17 at 14:59
  • @HansPassant, I'm not sure what you mean with "the problem is imaginary'' . Using this code in my program and checking file fragementation with "O&O Defrag Free Edition", for a 600 MB file, I get: a) without the line FileStream.SetLength: 90000 fragments!! b) with the FileStream.SetLength(Length) or FileStream.SetLength(Length-1), I get no fragmentation, only one piece! – EricBDev Jan 06 '17 at 22:57
  • 1
    Looking at the implementation of SetLengthCore in https://referencesource.microsoft.com/#mscorlib/system/io/filestream.cs,d6c30590c2fd88be gives some hint with SeekCore calls and Win32Native.SetEndOfFile(_handle) call. But I don't really see why SetLength(Length - 1) is better than SetLength(Length). – EricBDev Jan 06 '17 at 23:16
1

Considering Hans Passant answer, in my code above, an alternative to

destinationStream.SetLength(sourceStream.Length);

would be, if I understood it properly:

byte[] writeOneZero = {0};
destinationStream.Seek(sourceStream.Length - 1, SeekOrigin.Begin);
destinationStream.Write(writeOneZero, 0, 1);
destinationStream.Seek(0, SeekOrigin.Begin);

It seems indeed to consolidate the copy.

But a look at the source code of FileStream.SetLengthCore seems it does almost the same, seeking at the end but without writing one byte:

    private void SetLengthCore(long value)
    {
        Contract.Assert(value >= 0, "value >= 0");
        long origPos = _pos;

        if (_exposedHandle)
            VerifyOSHandlePosition();
        if (_pos != value)
            SeekCore(value, SeekOrigin.Begin);
        if (!Win32Native.SetEndOfFile(_handle)) {
            int hr = Marshal.GetLastWin32Error();
            if (hr==__Error.ERROR_INVALID_PARAMETER)
                throw new ArgumentOutOfRangeException("value", Environment.GetResourceString("ArgumentOutOfRange_FileLengthTooBig"));
            __Error.WinIOError(hr, String.Empty);
        }
        // Return file pointer to where it was before setting length
        if (origPos != value) {
            if (origPos < value)
                SeekCore(origPos, SeekOrigin.Begin);
            else
                SeekCore(0, SeekOrigin.End);
        }
    }

Anyway, that these methods won't guarantee no fragmentation, but at least avoid it for most of the cases. Thus the auto defragment tool will finish the job at a low performance expense. My initial code without this Seek calls created hundred of thousands of fragments for 1 GB file, slowing down my machine when the defragment tool went active.

EricBDev
  • 1,279
  • 13
  • 21
  • 2
    I copied yesterday a 100 GB VM file where the target drive has enough space (however, target is a SSD, where fragmentation is not relevant, so it may change the results in windows kernel). a) with windows 10 explorer/copy: target file had 3 fragments b) with SetLength(): same 3 fragments c) with the code above/writeOneZero/seek+write: only 1 piece Thus, this seek+write does make sense! – EricBDev Jan 10 '17 at 14:03