I saw this command for imagemagick which is cool
compare -metric RMSE first.png second.png NULL:
I tried it and two similar images (however not a modification of each other) scored <15% difference. But if I wanted to find images that are similar I cant really scan them against each image with 1M images, it would just be too consuming. Is there a way to fingerprint the images instead (edge fingerprints maybe? edge+color?) and store a 1-4k bytes and use that data to compare how close they are instead?