Trying to compare our current project media server (dir1) with a backup (dir2) to see what documents were deleted. Both are windows directories. Many of the files have been shuffled around into new sub-directories but are not missing. Because the directory structure has changed using recursion and filecmp.dircmp per this post won't work: Recursively compare two directories to ensure they have the same files and subdirectories
The other considerations is that different files will have the same file name, so comparison will need to compare file size, modification date, etc to determine if two files are the same.
What I want sudo-code:
def find_missing_files(currentDir, backup):
<does stuff>
return <List of Files in backup that are not in currentDir>
What I have:
def build_file_list(someDir, fileList = []):
for root, dirs, files in os.walk(someDir):
if files:
for file in files:
filePath = os.path.join(root, file)
if filePath not in fileList:
fileList.append(filePath)
return fileList
def cmp_file_lists(dir1, dir2):
dir1List = build_file_list(dir1)
dir2List = build_file_list(dir2)
for dir2file in dir2List:
for dir1file in dir1List:
if filecmp.cmp(dir1file, dir2file):
dir1List.remove(dir1file)
dir2List.remove(dir2file)
break
return (dir1List, dir2List)
EDIT: in above code I am having an issue where dir2List.remove(dir2file) throw error that dir2file is not in dir2List because (it appears) somehow both dir2list and dir1List are the same object. Dunno how that is happening.
I don't know if this could more easily be done with filecmp.dircmp but I am missing it? or if this is the best approach to achieve what I am looking for? ...or should I take each file from dir2 and us os.walk to look for it in dir1?