6

we have a NetApp NAS filer which from time to time seems to fail, not sure if this depends on network issues, heavy load or the Filer itself; the thing is that the usual System.IO.File.Copy(...) command fails sometimes unexpectedly while it worked a minute before and works again a minute after... filer is working with the CIFS file system.

in my Log4Net logfiles I see the exception:

System.IO.IOException: The specified network name is no longer available. at System.IO.__Error.WinIOError(Int32 errorCode, String maybeFullPath) ...

network team is unsure what happens and why, I am now thinking if I can implement a simple try/retry system to copy the file and retry the copy in case of failure, it could possibly be that System.IO.File.Copy was not designed for CIFS storages but for normal NTFS drives or stable network storage.

Are there common patterns or .NET classes suitable to do this copy and retry or should I simply use an approach like in the following pseudo-code?

while(!copied && count <5)
{
  count++;

  try
  {
    //here copy the file
    ...

    //if no exception copy was ok
    copied = true;
  }
  catch
  {
    if(count >= 5)
    {
      // Log that retry limit has been reached...
    }
    else
    {
      // make thread to wait for some time,
      // waiting time can be in function of count or fixed...
    }
  }
} 
Davide Piras
  • 43,984
  • 10
  • 98
  • 147
  • 2
    `System.IO.File.Copy()` boils down to the Win32 [`CopyFile`](http://msdn.microsoft.com/en-us/library/windows/desktop/aa363851.aspx) function. So I would code up something that uses that repeatedly and see if it fails as well. I think you're trying to put a software band-aid on a bigger issue. – Jonathon Reinhart Aug 08 '12 at 19:32
  • 1
    Setup a `ping -t hostname` and log it to a file. Let it run for a few hours then look for timeouts. That should tell you if the problem is network related. Could still be a problem with the device itself, but at least you can rule out a .NET issue. – Jimmy D Aug 08 '12 at 19:32
  • I like that loop. Possible put a small wait in catch. – paparazzo Aug 08 '12 at 19:43
  • 1
    I've seen this on servers where the drivers where not up to date, where the network router annex firewall failed now and then, and on virtual boxes. Are the LUNs connected to the server over dedicated NICs? – rene Aug 08 '12 at 19:44
  • @Blam right, I had it in my mind and forgot to write it, the waiting logic should probably be similar to the standard slotted aloha waiting system, which increases the waiting time with the number of failed attempts. Editing question... – Davide Piras Aug 08 '12 at 19:59
  • @FrankWhite this is a good idea, surely I should verify the timeouts to identify where the issue is, thanks! – Davide Piras Aug 08 '12 at 20:04

2 Answers2

2

Same hapens to me. I've an old NAS Server and from time to time Windows shows an error telling me that the drive is not accessible anymore.
To manage the file copy process maybe you could use instead CopyFileEx (from Windows API) as shown in the next sample:

public class SecureFileCopy
{
    public static void CopyFile(FileInfo source, FileInfo destination, 
        CopyFileOptions options, CopyFileCallback callback, object state)
    {
        if (source == null) throw new ArgumentNullException("source");
        if (destination == null) 
            throw new ArgumentNullException("destination");
        if ((options & ~CopyFileOptions.All) != 0) 
            throw new ArgumentOutOfRangeException("options");

        new FileIOPermission(
            FileIOPermissionAccess.Read, source.FullName).Demand();
        new FileIOPermission(
            FileIOPermissionAccess.Write, destination.FullName).Demand();

        CopyProgressRoutine cpr = callback == null ? 
            null : new CopyProgressRoutine(new CopyProgressData(
                source, destination, callback, state).CallbackHandler);

        bool cancel = false;
        if (!CopyFileEx(source.FullName, destination.FullName, cpr, 
            IntPtr.Zero, ref cancel, (int)options))
        {
            throw new IOException(new Win32Exception().Message);
        }
    }

    private class CopyProgressData
    {
        private FileInfo _source = null;
        private FileInfo _destination = null;
        private CopyFileCallback _callback = null;
        private object _state = null;

        public CopyProgressData(FileInfo source, FileInfo destination, 
            CopyFileCallback callback, object state)
        {
            _source = source; 
            _destination = destination;
            _callback = callback;
            _state = state;
        }

        public int CallbackHandler(
            long totalFileSize, long totalBytesTransferred, 
            long streamSize, long streamBytesTransferred, 
            int streamNumber, int callbackReason,
            IntPtr sourceFile, IntPtr destinationFile, IntPtr data)
        {
            return (int)_callback(_source, _destination, _state, 
                totalFileSize, totalBytesTransferred);
        }
    }

    private delegate int CopyProgressRoutine(
        long totalFileSize, long TotalBytesTransferred, long streamSize, 
        long streamBytesTransferred, int streamNumber, int callbackReason,
        IntPtr sourceFile, IntPtr destinationFile, IntPtr data);

    [SuppressUnmanagedCodeSecurity]
    [DllImport("Kernel32.dll", CharSet=CharSet.Auto, SetLastError=true)]
    private static extern bool CopyFileEx(
        string lpExistingFileName, string lpNewFileName,
        CopyProgressRoutine lpProgressRoutine,
        IntPtr lpData, ref bool pbCancel, int dwCopyFlags);
}

public delegate CopyFileCallbackAction CopyFileCallback(
    FileInfo source, FileInfo destination, object state, 
    long totalFileSize, long totalBytesTransferred);

public enum CopyFileCallbackAction
{
    Continue = 0,
    Cancel = 1,
    Stop = 2,
    Quiet = 3
}

[Flags]
public enum CopyFileOptions
{
    None = 0x0,
    FailIfDestinationExists = 0x1,
    Restartable = 0x2,
    AllowDecryptedDestination = 0x8,
    All = FailIfDestinationExists | Restartable | AllowDecryptedDestination
}

There's a more extensive description in MSDN Magazine.

Jaime Oro
  • 9,899
  • 8
  • 31
  • 39
  • 1
    this could be an idea, thanks! my process is a console application triggered by a Windows Scheduled Task on a server so there is no user to watch it and I don't need feedback like progress bar or cancel buttons, but it this works anyway all the better, will give it a try. – Davide Piras Aug 08 '12 at 20:03
  • 1
    If you've control over what's happening, maybe you can automatically pospone the copy of those files which were problematic in the first try and wait one hour to retry. – Jaime Oro Aug 08 '12 at 20:07
1

after weeks and weeks of research, tests and pain I finally seem to have found a working solution, decided to replace the System.IO.File.Copy method with invocation of Microsoft Robocopy command which is available in Win Server 2008 R2 and seemed to work well from the first try. This makes me comfortable that I am not reinventing the wheel but using a tested technology designed exactly for my needs. thanks everybody for your answers and comments anyway.

Davide Piras
  • 43,984
  • 10
  • 98
  • 147