4

I have an app that currently allows a user to upload a file and it saves the file on the web server. My client has now decided to use a third party cloud hosting service for their file storage needs. The company has their own API for doing CRUD operations on their server, so I wrote a script to test their API and it sends a file as a base64 encoded JSON payload to the API. The script works fine but now I'm stuck on how exactly how I should implement this functionality into Django.

json_testing.py

import base64
import json
import requests
import magic

filename = 'test.txt'

# Open file and read file and encode it as a base64 string
with open(filename, "rb") as test_file:
    encoded_string = base64.b64encode(test_file.read())

# Get MIME type using magic module
mime = magic.Magic(mime=True)
mime_type = mime.from_file(filename)

# Concatenate MIME type and encoded string with string data
# Use .decode() on byte data for mime_type and encoded string
file_string = 'data:%s;base64,%s' % (mime_type.decode(), encoded_string.decode())
payload = {
    "client_id": 1,
    "file": file_string
}
headers = {
    "token": "AuthTokenGoesHere",
    "content-type": "application/json",
}
request = requests.post('https://api.website.com/api/files/', json=payload, headers=headers)
print(request.json())

models.py

def upload_location(instance, filename):
    return '%s/documents/%s' % (instance.user.username, filename)

class Document(models.Model):
    user = models.ForeignKey(settings.AUTH_USER_MODEL)
    category = models.ForeignKey(Category, on_delete=models.CASCADE)
    file = models.FileField(upload_to=upload_location)

    def __str__(self):
        return self.filename()

    def filename(self):
        return os.path.basename(self.file.name)

So to reiterate, when a user uploads a file, instead of storing the file somewhere on the web server, I want to base64 encode the file so I can send the file as a JSON payload. Any ideas on what would be the best way to approach this?

MassDefect_
  • 1,821
  • 3
  • 20
  • 24
  • I'm not sure I understand. You already know how to do an appropriate request, where's the problem? – freakish Jun 20 '16 at 16:28
  • @freakish. Instead of physically saving the file on the web server, I just want to encode the file, send it as a payload, and then discard the file. Do I have to upload the file, then do the encoding, then send it as a JSON payload, then delete the file? I was wondering if there was some way I could encode the file in memory without having to save it to the web server – MassDefect_ Jun 20 '16 at 16:30
  • just call u're script from the django app instead of what u used to do, i do not get it .... – Ohad the Lad Jun 20 '16 at 16:34
  • @OhadtheLad. The simplest way I can put this is that I want to avoid saving the file to the web server entirely. I just want to encode the file, send it as a payload, and discard it, if that's possible. I was just wondering if there was another way of doing this than uploading the file, saving it to the web server, then encoding it, then sending the payload, then deleting the file. – MassDefect_ Jun 20 '16 at 16:37
  • I'm sure u can, just hold the file data in a buffer and encode it to another buffer, send the last one, del both from memory...http://stackoverflow.com/questions/23164058/how-to-encode-text-to-base64-in-python – Ohad the Lad Jun 20 '16 at 16:41
  • 1
    @nastyn8 You don't have to save the file onto hard drive as long as it fits in memory (but I assume that these files are small since you are using base64 encoding which is very inefficient). Actually it's pretty much the same procedure, except you change the underlying storage. Are you saying you don't know how to do that? Is that the question? – freakish Jun 20 '16 at 17:08

1 Answers1

7

The simplest way I can put this is that I want to avoid saving the file to the web server entirely. I just want to encode the file, send it as a payload, and discard it, if that's possible.

From the django docs:

Upload Handlers

When a user uploads a file, Django passes off the file data to an upload handler – a small class that handles file data as it gets uploaded. Upload handlers are initially defined in the FILE_UPLOAD_HANDLERS setting, which defaults to:

["django.core.files.uploadhandler.MemoryFileUploadHandler", "django.core.files.uploadhandler.TemporaryFileUploadHandler"]

Together MemoryFileUploadHandler and TemporaryFileUploadHandler provide Django’s default file upload behavior of reading small files into memory and large ones onto disk.

You can write custom handlers that customize how Django handles files. You could, for example, use custom handlers to enforce user-level quotas, compress data on the fly, render progress bars, and even send data to another storage location directly without storing it locally. See Writing custom upload handlers for details on how you can customize or completely replace upload behavior.

Contrary thoughts:

I think you should consider sticking with the default file upload handlers because they keep someone from uploading a file that will overwhelm the server's memory.

Where uploaded data is stored

Before you save uploaded files, the data needs to be stored somewhere.

By default, if an uploaded file is smaller than 2.5 megabytes, Django will hold the entire contents of the upload in memory. This means that saving the file involves only a read from memory and a write to disk and thus is very fast.

However, if an uploaded file is too large, Django will write the uploaded file to a temporary file stored in your system’s temporary directory. On a Unix-like platform this means you can expect Django to generate a file called something like /tmp/tmpzfp6I6.upload. If an upload is large enough, you can watch this file grow in size as Django streams the data onto disk.

These specifics – 2.5 megabytes; /tmp; etc. – are simply “reasonable defaults” which can be customized as described in the next section.


request.FILES info:

#forms.py:

from django import forms

class UploadFileForm(forms.Form):
    title = forms.CharField(max_length=50)
    json_file = forms.FileField()

A view handling this form will receive the file data in request.FILES, which is a dictionary containing a key for each FileField (or ImageField, or other FileField subclass) in the form. So the data from the above form would be accessible as request.FILES[‘json_file’].

Note that request.FILES will only contain data if the request method was POST and the <form> that posted the request has the attribute enctype="multipart/form-data". Otherwise, request.FILES will be empty.


HttpRequest.FILES

A dictionary-like object containing all uploaded files. Each key in FILES is the name from the <input type="file" name="" />. Each value in FILES is an UploadedFile.


Upload Handlers

When a user uploads a file, Django passes off the file data to an upload handler – a small class that handles file data as it gets uploaded. Upload handlers are initially defined in the FILE_UPLOAD_HANDLERS setting, which defaults to:

["django.core.files.uploadhandler.MemoryFileUploadHandler", "django.core.files.uploadhandler.TemporaryFileUploadHandler"]

The source code for TemporaryFileUploadHandler contains this:

lass TemporaryFileUploadHandler(FileUploadHandler):
    """
    Upload handler that streams data into a temporary file.
    """
      ...
      ...
      def new_file(self, *args, **kwargs):
        """
        Create the file object to append to as data is coming in.
        """
        ...
        self.file = TemporaryUploadedFile(....)  #<***HERE

And the source code for TemporaryUploadedFile contains this:

class TemporaryUploadedFile(UploadedFile):
    """
    A file uploaded to a temporary location (i.e. stream-to-disk).
    """
    def __init__(self, name, content_type, size, charset, content_type_extra=None):
        ...
        file = tempfile.NamedTemporaryFile(suffix='.upload')  #<***HERE

And the python tempfile docs say this:

tempfile.NamedTemporaryFile(...., delete=True)
...
If delete is true (the default), the file is deleted as soon as it is closed.

Similarly, the other of the two default file upload handlers, MemoryFileUploadHandler, creates a file of type BytesIO:

A stream implementation using an in-memory bytes buffer. It inherits BufferedIOBase. The buffer is discarded when the close() method is called.

Therefore, all you have to do is close request.FILES[“field_name”] to erase the file (whether the file contents are stored in memory or on disk in the /tmp file directory), e.g.:

 uploaded_file = request.FILES[“json_file”]
 file_contents = uploaded_file.read()

 #Send file_contents to other server here.

 uploaded_file.close()  #erases file

If for some reason you don't want django to write to the server's /tmp directory at all, then you'll need to write a custom file upload handler to reject uploaded files that are too large.

7stud
  • 46,922
  • 14
  • 101
  • 127
  • 1
    @nastyn8, I thought about this some more, and the default file upload handlers are well thought out, so there is no need to create a custom upload handler to do what you want. See additional explanation at the bottom of my answer. – 7stud Jun 21 '16 at 23:19