0

I apologize if this has been asked already, or if this is answered somewhere else.

Anyways, I'm working on a project that, in short, stores image metadata and then allows the user to search said metadata (which resembles a long list of key-value pairs). This wouldn't be too big of an issue if the metadata was standardized. However, the problem is that for any given image in the database, there is any number of key/values in its metadata. Also there is no standard list of what keys there are.

Basically, I need to find a way to store a dictionary for each model, but with arbitrary key/value pairs. And I need to be able to query them. And the organization I'm working for is planning on uploading thousands of images to this program, so it has to query reasonably fast.

I have one model in my database, an image model, with a filefield.

So, I'm in between two options, and I could really use some help from people with more experience on choosing the best one (or any other solutions that would work better)

  1. Using a traditional relational database like MySql, and creating a separate model with a foreignkey to the image model, a key field, and a value field. Then, when I need to query the data, I'll ask for every instance of this separate table that relates to an image, and then query those rows for the key/value combination I need.

  2. Using something like MongoDB, with django-toolbox and its DictField to store the metadata. Then, when I need to query, I'll access the dict and search it for the key/value combination I need.

While I feel like 1 would be much better in terms of query time, each image may have up to 40 key/values of metadata, and that makes me worry about that separate "dictionary" table growing far too large if there's thousands of images.

Any advice would be much appreciated. Thanks!

Community
  • 1
  • 1
  • Does this answer your question? [arbitrary typed data in django model](https://stackoverflow.com/questions/2637438/arbitrary-typed-data-in-django-model) – outis Jan 04 '22 at 02:35

2 Answers2

0

What's the type of metadata? Both key and value are string? I assume it's the case.

The scale of your dataset matters. If you will have up to thousands images and each image has up to 40 key-value pairs, then in option 1, the separate table would have at most 400k records. That's no problem for modern database, as long as you have not bad machine and correct DB settings. One issue to take care is to composite index fields in the table. In Django ORM, it would be something like:

class ImageMeta(models.Model):
    image = models.ForeignKey('Image')
    key = models.CharField(max_length=XXXX)
    value = models.CharField(max_length=XXXX)
    class Meta:
        index_together = [ ["image", "key", "value"], ]     # Django 1.5 and above
ZZY
  • 3,689
  • 19
  • 22
0

In a Django project you've got 4 alternatives for this kind of problem, in no particular order:

  • using PostgreSQL, you can use the hstore field type, that's basically a pickled python dictionary. It's not very helpful in terms of querying it, but does its job saving your data.

  • using Django-NoRel with mongodb you get the ListField field type that does the same thing and can be queried just like anything in mongo. (option 2)

  • using Django-eav to create an entity attribute value store with your data. Elegant solution but painfully slow queries. (option 1)

  • storing your data as a json string in a long enough TextField and creating your own functions to serializing and deserializing the data, without thinking on being able to make a query over it.

In my own experience, if you by any chance need to query over the data, your option two is by far the best choice. EAV in Django, without composite keys, is painful.

fixmycode
  • 8,220
  • 2
  • 28
  • 43