2

I am trying to production-deploy a Django project for the first time. My pre-Django implementation contains quite a bit of data (thousands of images, thousands of entries in the database). I have written a custom manage.py populate_db command that parses the dump of the old database and creates the new Django database based on my models. However I am stuck with adding all the files.

My initial deployment plan is to create the database locally once, sFTP-upload all images to the media folder on the server, do the same with the database and that's it. For this, I need the database to reference the image files in the media folder. As far as I understand, Django does it simply by filename. So far, I've been able to do this:

def _add_image_files(self, user: str):
    i = 0 
    for f in local_files:
        i += 1
        photo = Photo(album=1, number=i, back=False, added_by=user)
        from django.core.files import File
        photo.image.save(f.name, File(open(f.path, 'rb')))

where Photo is defined as

class Photo(models.Model):
    album = models.IntegerField(null=True)
    number = models.IntegerField(null=True)
    image = models.ImageField(null=True, upload_to='photo_image_files')

However Django obviously duplicates every file by reading and writing it to the media folder. I don't need this duplication, I only want to add a reference to each file to the database.

How can I achieve it? Alternatively, if this is for some reason a poor approach, please let me know a better way. Thanks.

texnic
  • 3,959
  • 4
  • 42
  • 75

2 Answers2

0

I have actually found the answer here: https://stackoverflow.com/a/12917845/674976 The "trick" is that ImageField can be simply populated with the image file name. Combined with bulk_create, I now get a reasonable execution time.

 photo_list = []
 for i in range(1000):
     photo = Photo(album=1, number=i, back=False)
     photo.image = 'photo_image_files/:filename_i.jpg'
     photo_list.append(photo)
 Photo.objects.bulk_create(photo_list)

I couldn't find any reference for this trick (assigning file name directly to ImageField), so if you could add one by editing or in comments, it would be a nice improvement to this answer.

Community
  • 1
  • 1
texnic
  • 3,959
  • 4
  • 42
  • 75
0

django version: 3.2.4

Well, you may define your storage, which won't duplicate file, case it already exists. Warning: this solution leads to problems when 2 processes will try to save file with same name. But it fits for programmatically populate the database.

models.py
...
from django.core.files.storage import FileSystemStorage
...
class NonDuplicateStorage(FileSystemStorage):
    OS_OPEN_FLAGS = os.O_WRONLY | os.O_CREAT | getattr(
        os, 'O_BINARY', 0)

    def exists(self, name):
        return False

non_duplicate_storage = NonDuplicateStorage()

class Photo(models.Model):
    ...
    image = models.ImageField(null=True, upload_to='photo_image_files', storage = non_duplicate_storage)

What's happened here: let's open storage class

Lib\site-packages\django\core\files\storage.py

class FileSystemStorage(Storage):
    """
    Standard filesystem storage
    """
    # The combination of O_CREAT and O_EXCL makes os.open() raise OSError if
    # the file already exists before it's opened.
    OS_OPEN_FLAGS = os.O_WRONLY | os.O_CREAT | os.O_EXCL | getattr(
        os, 'O_BINARY', 0)
def _save(self, name, content):
...
        # There's a potential race condition between get_available_name and
        # saving the file; it's possible that two threads might return the
        # same name, at which point all sorts of fun happens. So we need to
        # try to create the file, but if it already exists we have to go back
        # to get_available_name() and try again.
...
        while True:
...
                    fd = os.open(full_path, self.OS_OPEN_FLAGS, 0o666)
...
            except FileExistsError:
                # A new name is needed if the file exists.
                name = self.get_available_name(name)
...

and self.get_available_name call self.exists(), case file already exists - it will be renamed