40

In the upcoming Java7, there is a new API to check if two file object are same file reference.

Are there similar API provided in the .NET framework?

I've search it over MSDN but nothing enlighten me.

I want it simple but I don't want to compare by filename which will cause problems with hard/symbolic links and different style of path. (e.g. \\?\C:\, C:\).

What I going to do is just prevent duplicated file being drag and dropped to my linklist.

Maxence
  • 12,868
  • 5
  • 57
  • 69
Dennis C
  • 24,511
  • 12
  • 71
  • 99

7 Answers7

34

As far as I can see (1) (2) (3) (4), the way JDK7 does it, is by calling GetFileInformationByHandle on the files and comparing dwVolumeSerialNumber, nFileIndexHigh and nFileIndexLow.

Per MSDN:

You can compare the VolumeSerialNumber and FileIndex members returned in the BY_HANDLE_FILE_INFORMATION structure to determine if two paths map to the same target; for example, you can compare two file paths and determine if they map to the same directory.

I do not think this function is wrapped by .NET, so you will have to use P/Invoke.

It might or might not work for network files. According to MSDN:

Depending on the underlying network components of the operating system and the type of server connected to, the GetFileInformationByHandle function may fail, return partial information, or full information for the given file.

A quick test shows that it works as expected (same values) with a symbolic link on a Linux system connected using SMB/Samba, but that it cannot detect that a file is the same when accessed using different shares that point to the same file (FileIndex is the same, but VolumeSerialNumber differs).

Rasmus Faber
  • 48,631
  • 24
  • 141
  • 189
  • 1
    This definitely looks like the way to go. MSDN says that those three fields uniquely identify a file. You'll need to use the Win32 API to get to them, though. – Dana Robinson Jan 04 '09 at 10:41
  • Thanks, I added the MSDN citation. – Rasmus Faber Jan 04 '09 at 10:44
  • I doubt it would return the same result for the same file accessed through different shares though; you should check if \\server\share1\file is the same as \\server\share2\subdirectory\file when the files are really the same. – configurator Jan 04 '09 at 12:01
  • Thanks Rasmus for the link to java.nio.file.FileRef (which you gave me in another question). I went after your links here but it wasn't one of them. I think it would be useful here too (maybe as #4). – Hosam Aly Jan 04 '09 at 13:20
  • Well, FileRef is just an interface (which WindowsPath implements), so I did not think there would be any interesting things to see in the source code. But if you think it will be worthwhile, I will add it. – Rasmus Faber Jan 04 '09 at 14:42
  • I found this: http://stackoverflow.com/questions/271398/post-your-extension-goodies-for-c-net-codeplex-com-extensionoverflow?answer=274652#274652 (Haven't tested it myself, but it uses GetFileInformationByHandle) – tuinstoel Jan 07 '09 at 19:55
  • One thing I haven't seen mentioned: if you have a hard link, then it is viewed as a different file even though the content for both files is the exact same. Something to keep in mind. Whether that's important is very much dependent on what you do with the files. – Alexis Wilke Sep 18 '12 at 18:54
8

Edit: Note that @Rasmus Faber mentions the GetFileInformationByHandle function in the Win32 api, and this does what you want, check and upvote his answer for more information.


I think you need an OS function to give you the information you want, otherwise it's going to have some false negatives whatever you do.

For instance, does these refer to the same file?

  • \server\share\path\filename.txt
  • \server\d$\temp\path\filename.txt

I would examine how critical it is for you to not have duplicate files in your list, and then just do some best effort.

Having said that, there is a method in the Path class that can do some of the work: Path.GetFullPath, it will at least expand the path to long names, according to the existing structure. Afterwards you just compare the strings. It won't be foolproof though, and won't handle the two links above in my example.

Community
  • 1
  • 1
Lasse V. Karlsen
  • 380,855
  • 102
  • 628
  • 825
  • The documentation also says: "Otherwise, this method checks if both FileRefs locate the same file, and depending on the implementation, may require to open or access both files." I am actually very intersted in seeing how this can be done! – Hosam Aly Jan 04 '09 at 09:44
  • Using Path.GetFullPath doesn't work, try if (Path.GetFullPath(@"c:\vobp.log") == Path.GetFullPath(@"c:\vobp.log".ToUpper())) {} – tuinstoel Jan 04 '09 at 10:00
  • Yeah, notice that, I said *some* of the work, there is no method in .NET that will do it all for you. – Lasse V. Karlsen Jan 04 '09 at 10:48
5

Here is a C# implementation of IsSameFile using GetFileInformationByHandle:

NativeMethods.cs

public static class NativeMethods
{
  [StructLayout(LayoutKind.Explicit)]
  public struct BY_HANDLE_FILE_INFORMATION
  {
    [FieldOffset(0)]
    public uint FileAttributes;

    [FieldOffset(4)]
    public FILETIME CreationTime;

    [FieldOffset(12)]
    public FILETIME LastAccessTime;

    [FieldOffset(20)]
    public FILETIME LastWriteTime;

    [FieldOffset(28)]
    public uint VolumeSerialNumber;

    [FieldOffset(32)]
    public uint FileSizeHigh;

    [FieldOffset(36)]
    public uint FileSizeLow;

    [FieldOffset(40)]
    public uint NumberOfLinks;

    [FieldOffset(44)]
    public uint FileIndexHigh;

    [FieldOffset(48)]
    public uint FileIndexLow;
  }

  [DllImport("kernel32.dll", SetLastError = true)]
  public static extern bool GetFileInformationByHandle(SafeFileHandle hFile, out BY_HANDLE_FILE_INFORMATION lpFileInformation);

  [DllImport("kernel32.dll", CharSet = CharSet.Auto, SetLastError = true)]
  public static extern SafeFileHandle CreateFile([MarshalAs(UnmanagedType.LPTStr)] string filename,
    [MarshalAs(UnmanagedType.U4)] FileAccess access,
    [MarshalAs(UnmanagedType.U4)] FileShare share,
    IntPtr securityAttributes,
    [MarshalAs(UnmanagedType.U4)] FileMode creationDisposition,
    [MarshalAs(UnmanagedType.U4)] FileAttributes flagsAndAttributes,
    IntPtr templateFile);
}

PathUtility.cs

public static bool IsSameFile(string path1, string path2)
{
  using (SafeFileHandle sfh1 = NativeMethods.CreateFile(path1, FileAccess.Read, FileShare.ReadWrite, 
      IntPtr.Zero, FileMode.Open, 0, IntPtr.Zero))
  {
    if (sfh1.IsInvalid)
      Marshal.ThrowExceptionForHR(Marshal.GetHRForLastWin32Error());

    using (SafeFileHandle sfh2 = NativeMethods.CreateFile(path2, FileAccess.Read, FileShare.ReadWrite,
      IntPtr.Zero, FileMode.Open, 0, IntPtr.Zero))
    {
      if (sfh2.IsInvalid)
        Marshal.ThrowExceptionForHR(Marshal.GetHRForLastWin32Error());

      NativeMethods.BY_HANDLE_FILE_INFORMATION fileInfo1;
      bool result1 = NativeMethods.GetFileInformationByHandle(sfh1, out fileInfo1);
      if (!result1)
        throw new IOException(string.Format("GetFileInformationByHandle has failed on {0}", path1));

      NativeMethods.BY_HANDLE_FILE_INFORMATION fileInfo2;
      bool result2 = NativeMethods.GetFileInformationByHandle(sfh2, out fileInfo2);
      if (!result2)
        throw new IOException(string.Format("GetFileInformationByHandle has failed on {0}", path2));

      return fileInfo1.VolumeSerialNumber == fileInfo2.VolumeSerialNumber
        && fileInfo1.FileIndexHigh == fileInfo2.FileIndexHigh
        && fileInfo1.FileIndexLow == fileInfo2.FileIndexLow;
    }
  }
}
Maxence
  • 12,868
  • 5
  • 57
  • 69
2

Answer: There is no foolproof way in which you can compare to string base paths to determine if they point to the same file.

The main reason is that seemingly unrelated paths can point to the exact same file do to file system redirections (junctions, symbolic links, etc ...) . For example

"d:\temp\foo.txt" "c:\othertemp\foo.txt"

These paths can potentially point to the same file. This case clearly eliminates any string comparison function as a basis for determining if two paths point to the same file.

The next level is comparing the OS file information. Open the file for two paths and compare the handle information. In windows this can be done with GetFileInformationByHandle. Lucian Wischik did an excellent post on this subject here.

There is still a problem with this approach though. It only works if the user account performing the check is able to open both files for reading. There are numerous items which can prevent a user from opening one or both files. Including but not limited to ...

  • Lack of sufficient permissions to file
  • Lack of sufficient permissions to a directory in the path of the file
  • File system change which occurs between the opening of the first file and the second such as a network disconnection.

When you start looking at all of these problems you begin to understand why Windows does not provide a method to determine if two paths are the same. It's just not an easy/possible question to answer.

JaredPar
  • 733,204
  • 149
  • 1,241
  • 1,454
  • 1
    The documentation for GetFileInformationByHandle says: "nFileIndexLow: Low-order part of a unique identifier that is associated with a file. This value is useful ONLY WHILE THE FILE IS OPEN by at least one process. If no processes have it open, the index may change the next time the file is opened." – Integer Poet Jul 27 '10 at 22:58
1

First I thought it is really easy but this doesn't work:

  string fileName1 = @"c:\vobp.log";
  string fileName2 = @"c:\vobp.log".ToUpper();
  FileInfo fileInfo1 = new FileInfo(fileName1);
  FileInfo fileInfo2 = new FileInfo(fileName2);

  if (!fileInfo1.Exists || !fileInfo2.Exists)
  {
    throw new Exception("one of the files does not exist");
  }

  if (fileInfo1.FullName == fileInfo2.FullName)
  {
    MessageBox.Show("equal"); 
  }

Maybe this library helps http://www.codeplex.com/FileDirectoryPath. I haven't used it myself.

edit: See this example on that site:

  //
  // Path comparison
  //
  filePathAbsolute1 = new FilePathAbsolute(@"C:/Dir1\\File.txt");
  filePathAbsolute2 = new FilePathAbsolute(@"C:\DIR1\FILE.TXT");
  Debug.Assert(filePathAbsolute1.Equals(filePathAbsolute2));
  Debug.Assert(filePathAbsolute1 == filePathAbsolute2);
tuinstoel
  • 7,248
  • 27
  • 27
0

If you need to compare the same filenames over and over again, I would suggest you look into canonalizing those names.

Under a Unix system, there is the realpath() function which canonalizes your path. I think that's generally the best bet if you have a complex path. However, it is likely to fail on volumes mounted via network connections.

However, based on the realpath() approach, if you want to support multiple volume including network volumes, you could write your own function that checks each directory name in a path and if it references a volume then determine whether the volume reference in both paths is the same. This being said, the mount point may be different (i.e. the path on the destination volume may not be the root of that volume) so it is not that easy to solve all the problems along the way, but it is definitively possible (otherwise how would it work in the first place?!)

Once the filenames properly canonalized a simple string comparison gives you the correct answer.

Rasmus answer is probably the fastest way if you don't need to compare the same filenames over and over again.

Alexis Wilke
  • 19,179
  • 10
  • 84
  • 156
-4

You could always perform an MD5 encode on both and compare the result. Not exactly efficient, but easier than manually comparing the files yourself.

Here is a post on how to MD5 a string in C#.

Soviut
  • 88,194
  • 49
  • 192
  • 260
  • 4
    Why do an MD5? He can simply compare the contents. It would take the same time on positives, and would fail sooner on most negatives. – configurator Jan 04 '09 at 09:59
  • 5
    Also, this wouldn't be able to tell apart copies of the same file. – configurator Jan 04 '09 at 10:00
  • Good points. I was mainly going on the basis that the MD5 results could be cached and compared more easily as new results are dragged in. – Soviut Jan 04 '09 at 11:53
  • You may misunderstood my question. I am not asking are content of file same, but are two file path references to same file. If I lock/modify one, another will be affected. – Dennis C Jan 04 '09 at 14:19