News portal company has two servers (OS = Centos 6):
First #1 server has about 1 million images (.jpg, .png) and server #2 got almost the same count - 1 million of images. Some of them are identic duplicates, some are resized duplicates, some are with blur, some without blur, some are totally unique images. File names mainly are also different.
The mission is to merge two servers media catalogue into one. After merge duplicates must be romoved (to free up storage).
I've made some tests with Imagemagick compare -metric RMSE
, but i thought that this will take ages to compare each file with each file from two servers. So there will be 1mln x 1mln = 1 trillion operations, this will take ages...
Any suggestions here?