Here is a simple approach with a 256 bit image-hash (MD5 has 128 bit)
- resize the picture to 16x16 pixel

- reduce colors to black/white (which equals true/false in this console output)

- read the boolean values into
List<bool>
- this is the hash
Code:
public static List<bool> GetHash(Bitmap bmpSource)
{
List<bool> lResult = new List<bool>();
//create new image with 16x16 pixel
Bitmap bmpMin = new Bitmap(bmpSource, new Size(16, 16));
for (int j = 0; j < bmpMin.Height; j++)
{
for (int i = 0; i < bmpMin.Width; i++)
{
//reduce colors to true / false
lResult.Add(bmpMin.GetPixel(i, j).GetBrightness() < 0.5f);
}
}
return lResult;
}
I know, GetPixel
is not that fast but on a 16x16 pixel image it should not be the bottleneck.
- compare this hash to hash values from other images and add a tolerance.(number of pixels that can differ from the other hash)
Code:
List<bool> iHash1 = GetHash(new Bitmap(@"C:\mykoala1.jpg"));
List<bool> iHash2 = GetHash(new Bitmap(@"C:\mykoala2.jpg"));
//determine the number of equal pixel (x of 256)
int equalElements = iHash1.Zip(iHash2, (i, j) => i == j).Count(eq => eq);
So this code is able to find equal images with:
- different file formats (e.g. jpg, png, bmp)
- rotation (90, 180, 270), horizontal /vertical flip - by changing the iteration order of
i
and j
- different dimensions (same aspect is required)
- different compression (tolerance is required in case of quality loss like jpeg artifacts) - you can accept a 99% equality to be the same image and 50% to be a different one.
- colored changed to geyscaled and the other way round (because brightness is independent of color)
Update / Improvements:
after using this method for a while I noticed a few improvements that can be done
- replacing
GetPixel
for more performance
- using the exeif-thumbnail instead of reading the whole image for a performance improvement
- instead of setting
0.5f
to differ between light and dark - use the distinct median brightness of all 256 pixels. Otherwise dark/light images are assumed to be the same and it enables to detect images which have a changed brightness.
- if you need fast calculations, use
bool[]
or List<bool>
if you need to store a lot hashes with the need to save memory, use a Bitarray
because a Boolean isn't stored in a bit, it takes a byte!