1

filecmp.cmp documentation states (emphasis mine)

Compare the files named f1 and f2, returning True if they seem equal, False otherwise.
If shallow is true, files with identical os.stat() signatures are taken to be equal. Otherwise, the contents of the files are compared.

What does seem mean here? My understanding is that for shallow=False the contents of the files are compared, and therefore files are unambiguously either the same, or not.

WoJ
  • 27,165
  • 48
  • 180
  • 345
  • The answers to the question [`filecmp.cmp()` ignoring differing `os.stat()` signatures?](https://stackoverflow.com/questions/8045564/filecmp-cmp-ignoring-differing-os-stat-signatures) I once asked may clarify things for you. – martineau Mar 15 '21 at 21:09

1 Answers1

2

The source code is fairly straightforward; there's a link to it in the documentation page for the filecmp module, at the top:

def cmp(f1, f2, shallow=True):
    """Compare two files.
    Arguments:
    f1 -- First file name
    f2 -- Second file name
    shallow -- Just check stat signature (do not read the files).
               defaults to True.
    Return value:
    True if the files are the same, False otherwise.
    This function uses a cache for past comparisons and the results,
    with cache entries invalidated if their stat information
    changes.  The cache may be cleared by calling clear_cache().
    """

    s1 = _sig(os.stat(f1))
    s2 = _sig(os.stat(f2))
    if s1[0] != stat.S_IFREG or s2[0] != stat.S_IFREG:
        return False
    if shallow and s1 == s2:
        return True
    if s1[1] != s2[1]:
        return False

    outcome = _cache.get((f1, f2, s1, s2))
    if outcome is None:
        outcome = _do_cmp(f1, f2)
        if len(_cache) > 100:      # limit the maximum size of the cache
            clear_cache()
        _cache[f1, f2, s1, s2] = outcome
    return outcome

def _sig(st):
    return (stat.S_IFMT(st.st_mode),
            st.st_size,
            st.st_mtime)

def _do_cmp(f1, f2):
    bufsize = BUFSIZE
    with open(f1, 'rb') as fp1, open(f2, 'rb') as fp2:
        while True:
            b1 = fp1.read(bufsize)
            b2 = fp2.read(bufsize)
            if b1 != b2:
                return False
            if not b1:
                return True

So yes, it will compare the file contents.

Colonel Thirty Two
  • 23,953
  • 8
  • 45
  • 85