1

Summary: How can I use Python to locate two files with identical names (in known locations) and see if they are identical using hash.

I currently have two folders, Folder1 and Folder2. I am trying to figure out how I can have Python move file by file through Folder1. For each file (which is a .jpg image), I would like to use hash to get a unique integer for the image and copy the file name to a string. From this string, I plan to find the potential identical copy of that image in Folder2 and then use hash to see if they are identical.

I am new to Python and this is my first post on Stack Overflow. If there is any information I should include or area where I was unclear, please let me know and I will respond as soon as possible. Thanks, and thank you to Ares for recommending the use of hash.

Justlieb
  • 35
  • 1
  • 6

1 Answers1

1

Comparing file size really isn't enough, is it? You could easily have two files, a.jpg in folder A and a.jpg in folder B. They are both exactly 16 kb, except that one is a picture of a dog and one is a picture of a cat.

What may be valuable is to hash every image in the list. In Python, you can take anything - numbers, strings, images, etc - and call hash() on it.

A hash is a numeric representation of a specific set of data. Barring some rare exceptions, hashes are exactly unique to that data. No other sets of data, except those that are exactly the same, will have that hash.

Some examples:

> hash('test')

will output

2314058222102390712

> hash('Test')

-1504849438355502056

This question describes how to load an image in Python. Then, just call hash on each image. This describes what a hash is if I was unclear.

Community
  • 1
  • 1
Athena
  • 3,200
  • 3
  • 27
  • 35