13

Given two paths I have to compare if they're pointing to the same file or not. In Unix this can be done with os.path.samefile, but as documentation states it's not available in Windows. What's the best way to emulate this function? It doesn't need to emulate common case. In my case there are the following simplifications:

  • Paths don't contain symbolic links.
  • Files are in the same local disk.

Now I use the following:

def samefile(path1, path2)
    return os.path.normcase(os.path.normpath(path1)) == \
           os.path.normcase(os.path.normpath(path2))

Is this OK?

tshepang
  • 12,111
  • 21
  • 91
  • 136
  • 2
    `>>> os.path.normcase(os.path.normpath(r"c:\users\aayoubi\desktop")) 'c:\\users\\aayoubi\\desktop'` i couldn't find cases where this would fail. – aayoubi Jan 17 '12 at 10:18
  • I just found one example. 'c:\\one\two' and 'c:\\one\two\' can point to the same directory, but this method would say they're different. – Nikolay Polivanov Jan 17 '12 at 10:29
  • 1
    both outputs were the same: `>>> os.path.normcase(os.path.normpath(r"c:\\one\two")) 'c:\\one\\two'` `>>> os.path.normcase(os.path.normpath(r"c:\\one\two\\")) 'c:\\one\\two'` – aayoubi Jan 17 '12 at 10:36
  • Would you need to be able to handle network paths? e.g. (\\127.0.0.1\c$\test is equivalent to c:\test) – Shawabawa Jan 17 '12 at 12:27
  • @Shawabawa, no. I mentioned that files are in the same _local_ disk. – Nikolay Polivanov Jan 17 '12 at 12:54
  • Per [the official docs](https://docs.python.org/3/library/os.path.html#os.path.samefile),`os.path.samefile` is available for Windows as of Python 3.2. – Ray Feb 28 '19 at 06:20

4 Answers4

5

According to issue#5985 the os.path.samefile and os.path.sameopenfile are now in py3k. I verified this on Python 3.3.0

For older versions of Python here's a way which uses the GetFileInformationByHandle function:

see_if_two_files_are_the_same_file

Vanja
  • 4,415
  • 3
  • 26
  • 22
4

The real use-case of os.path.samefile is not symbolic links, but hard links. os.path.samefile(a, b) returns True if a and b are both hard links to the same file. They might not have the same path.

asmeurer
  • 86,894
  • 26
  • 169
  • 240
4

The os.stat system call returns a tuple with a lot of information about each file - including creation and last modification time stamps, size, file attributes. The chances of different files having the same paramters are very slim. I think it is very resonable to do:

def samefile(file1, file2):
    return os.stat(file1) == os.stat(file2)
jsbueno
  • 99,910
  • 10
  • 151
  • 209
  • 3
    I guess it's technically possible for the file to be modified in between the two calls to stat. Comparing the path like he does in the question wouldn't have this problem – Shawabawa Jan 17 '12 at 12:24
  • 1
    Yes, for random files such chances are very small. But I have a bunch of semi-automatically created files and many of them have the same size and time stamps. – Nikolay Polivanov Jan 17 '12 at 13:03
  • I think that approach is likely to lead to hard-to-find bugs. This kind of case can occur when for example an archiver unpacks lots of files all with the same timestamps. If they are zero-byte files than one could end up with lots of false matches... – David Fraser Dec 22 '14 at 12:21
  • I believe this is essentially how `samefile` is implemented. Except it only needs to compare `st_dev` and `st_ino` to _know_ if they are the same file or not. – Jesse Chisholm Aug 05 '19 at 22:35
1

I know this is a late answer in this thread. But I use python on Windows, and ran into this issue today, found this thread, and found that os.path.samefile doesn't work for me.

So, to answer the OP, now to emulate os.path.samefile, this is how I emulate it:

# because some versions of python do not have os.path.samefile
#   particularly, Windows. :(
#
def os_path_samefile(pathA, pathB):
  statA = os.stat(pathA) if os.path.isfile(pathA) else None
  if not statA:
    return False
  statB = os.stat(pathB) if os.path.isfile(pathB) else None
  if not statB:
    return False
  return (statA.st_dev == statB.st_dev) and (statA.st_ino == statB.st_ino)

It is not as tight as possible, because I was more interested in being clear in what I was doing.

I tested this on Windows-10, using python 2.7.15.

Jesse Chisholm
  • 3,857
  • 1
  • 35
  • 29