The problem is that iv'e got a folder with more than 80k images and about 40% of them are duplicate. (some of the pictures are rotated, some have different size, but still its the same image).
At first I used hashing algorithm (with c++/java) to delete all the duplicate images(that have the same size and other properties). But it seems it didnt delete all of them because some picture has a difrrent size (but are visually identical)
iv'e searched alot on the net to find any efficnt algoritam for this problem
the best code i found for my problem is with pHash, but its outdated and isn't working with VS anymore.
if someone have an idea for me, it will be awesome.
thanks