1

In a django application, I will have a database of images all classified under the image class in my models.py. Unfortunately there is a possibility that some of these images are duplicates of each other, and I want to write an application that (in part) will allow me to tag these duplicate images. Being new to database setup like this, what is the best way to implement this into my models.py?

My models.py is as follows:

class duplicate(models.Model):
    #some kind of code goes here?
    #perhaps...
    models.ImageField(upload_to='directory/') #not uploading a new image here- just want to link it to a database full of images somehow?

class image(models.Model):
    image = models.ImageField(upload_to='directory/')
    duplicate = models.ManyToManyField(duplicate, null=True) #is this the correct way to do this?
Paulo Scardine
  • 73,447
  • 11
  • 124
  • 153
mh00h
  • 1,824
  • 3
  • 25
  • 45
  • You can hack the Model.save/delete methods to store the image name and checksum in the database, then you can have a method that counts the number of images with the same checksum. – Paulo Scardine Jul 14 '13 at 21:05
  • Not all of the images will be cropped perfectly the same, but I really like where we're going here. In my case, I have a database full of images that could be duplicates of each other (but not the same scans, so they'll checksum differently). I need a way to say, "this image looks really similar to that other one I saw a few hours ago. I want them to be linked and include a description of why." It doesn't have to be automagic, just a way for me to say "these two images that I uploaded once upon a time are related." A manytomany relationship, if you will, of multiple images (`class image`'s). – mh00h Jul 14 '13 at 23:46
  • Exactly. As a result, all I am wanting to do is to mark in my database that two images, already in the database, are duplicates of each other. A user will be manually tagging the images as duplicates of each other. I just don't know how to define the many-to-many relationship in my models. A computer will not be discovering the duplicates, a user will. – mh00h Jul 15 '13 at 04:38
  • 1
    See the update on how to crate a recursive relationship. – Paulo Scardine Jul 15 '13 at 11:25

2 Answers2

2

You can hack the Model.save/delete methods to store the image name and checksum in the database, then you can have a method that counts the number of images with the same checksum.

Untested, just to get you started in the right direction:

class ImageAccounting(models.Model):
    fk = models.IntegerField()
    model_name = models.CharField(max_length=100)
    md5 = models.CharField(max_length=32)

class SomeModel(models.Model)
    ...
    image = models.ImageField(upload_to='somewhere')
    ...
    def image_signature(self):
        md5 = hashlib.md5(self.image.file.read()).hexdump()
        model_name = self.__name__
        return md5, model_name

    def save(self, *args, *kwargs):
        super(SomeModel, this).save(*args, **kwargs)
        md5, model_name = self.image_signature()
        try:
            i = ImageAccounting.objects.get(fk=self.pk, md5=md5, model_name=model_name)
        except ImageAccounting.DoesNotExist:
            i = ImageAccounting(fk=self.pk, md5=md5, model_name=model_name)
            i.save()

    def delete(self, *args, **kwargs):
        super(SomeModel, this).delete(*args, **kwargs)
        md5, model_name = self.image_signature()
        ImageAccounting.objects.filter(fk=self.pk, md5=md5, model_name=model_name)\
              .delete()

    def copies(self):
        md5, _ = self.image_signature()
        return ImageAccounting.objects.filter(md5=md5)

[Update]

Not all of the images will be cropped perfectly the same, but I really like where we're going here. In my case, I have a database full of images that could be duplicates of each other (but not the same scans, so they'll checksum differently). I need a way to say, "this image looks really similar to that other one I saw a few hours ago. I want them to be linked and include a description of why." It doesn't have to be automagic, just a way for me to say "these two images that I uploaded once upon a time are related." A manytomany relationship, if you will, of multiple images (class image's). – mh00h

If the images are not exact duplicates, we are entering the field of fuzzy databases and computer vision. These are not among the easier subjects of CS and I'm afraid a complete answer would not fit this space, but it is doable - OpenCV has a Python interface and it is the kind of project that benefits from the fast prototyping enabled by Python.

As a result, all I am wanting to do is to mark in my database that two images, already in the database, are duplicates of each other. A user will be manually tagging the images as duplicates of each other. I just don't know how to define the many-to-many relationship in my models. A computer will not be discovering the duplicates, a user will. – mh00h

If a human is classifying the images as duplicates, you just have to create a symmetrical recursive relationship. To create a recursive relationship – an object that has a many-to-one relationship with itself – use models.ManyToManyField('self'), there is no need for an intermediate model:

duplicates = models.ManyToManyField('self', null=True)           
Paulo Scardine
  • 73,447
  • 11
  • 124
  • 153
1

Well, you could use some lib for image processing: These links could be useful: http://atodorov.org/blog/2013/05/17/linux-and-python-tools-to-compare-images

Image Processing, In Python?

Community
  • 1
  • 1
rafaelreuber
  • 103
  • 7
  • though nice to know, my question has more to do with how to implement the django database backend for manual marking of images that have already been imported. – mh00h Jul 14 '13 at 14:50