You can hack the Model.save/delete methods to store the image name and checksum in the database, then you can have a method that counts the number of images with the same checksum.
Untested, just to get you started in the right direction:
class ImageAccounting(models.Model):
fk = models.IntegerField()
model_name = models.CharField(max_length=100)
md5 = models.CharField(max_length=32)
class SomeModel(models.Model)
...
image = models.ImageField(upload_to='somewhere')
...
def image_signature(self):
md5 = hashlib.md5(self.image.file.read()).hexdump()
model_name = self.__name__
return md5, model_name
def save(self, *args, *kwargs):
super(SomeModel, this).save(*args, **kwargs)
md5, model_name = self.image_signature()
try:
i = ImageAccounting.objects.get(fk=self.pk, md5=md5, model_name=model_name)
except ImageAccounting.DoesNotExist:
i = ImageAccounting(fk=self.pk, md5=md5, model_name=model_name)
i.save()
def delete(self, *args, **kwargs):
super(SomeModel, this).delete(*args, **kwargs)
md5, model_name = self.image_signature()
ImageAccounting.objects.filter(fk=self.pk, md5=md5, model_name=model_name)\
.delete()
def copies(self):
md5, _ = self.image_signature()
return ImageAccounting.objects.filter(md5=md5)
[Update]
Not all of the images will be cropped perfectly the same, but I really like where we're going here. In my case, I have a database full of images that could be duplicates of each other (but not the same scans, so they'll checksum differently). I need a way to say, "this image looks really similar to that other one I saw a few hours ago. I want them to be linked and include a description of why." It doesn't have to be automagic, just a way for me to say "these two images that I uploaded once upon a time are related." A manytomany relationship, if you will, of multiple images (class image's). – mh00h
If the images are not exact duplicates, we are entering the field of fuzzy databases and computer vision. These are not among the easier subjects of CS and I'm afraid a complete answer would not fit this space, but it is doable - OpenCV has a Python interface and it is the kind of project that benefits from the fast prototyping enabled by Python.
As a result, all I am wanting to do is to mark in my database that two images, already in the database, are duplicates of each other. A user will be manually tagging the images as duplicates of each other. I just don't know how to define the many-to-many relationship in my models. A computer will not be discovering the duplicates, a user will. – mh00h
If a human is classifying the images as duplicates, you just have to create a symmetrical recursive relationship. To create a recursive relationship – an object that has a many-to-one relationship with itself – use models.ManyToManyField('self')
, there is no need for an intermediate model:
duplicates = models.ManyToManyField('self', null=True)