What I want to do is create a list of files to compare in a directory of N files. The end goal is to compare images to find duplicates regardless of the format. Given the files 1.jpg 2.jpg 3.jpg.
Using this
import sys,os,time
def main(argv):
list1 = os.listdir(argv[0])
list2 = os.listdir(argv[0])
file_compare_list = []
for pic1 in list1:
for pic2 in list2:
file_compare_list.append([pic1,pic2])
print file_compare_list
if __name__ == "__main__":
main(sys.argv[1:])
I get a list like this
[['1.jpg', '1.jpg'], #0
['1.jpg', '2.jpg'], #1
['1.jpg', '3.jpg'], #2
['2.jpg', '1.jpg'], #3
['2.jpg', '2.jpg'], #4
['2.jpg', '3.jpg'], #5
['3.jpg', '1.jpg'], #6
['3.jpg', '2.jpg'], #7
['3.jpg', '3.jpg']] #8
Now I could go through the file and be assured that each file will be compared but there are obvious duplicates. Index 0, 4, and 8 are easy to take care of I can compare them by file name and get rid of them. What I am more concerned with is stuff like index 2 and 6 where if I did something it would be a duplicate. Any help with this would be greatly appreciated.