0

I am attempting to write a string directly into a Django FileField by way of ContentFile.

In doing so, I get a reproducible

TypeError: Unicode-objects must be encoded before hashing

error when attempting to save the contents of this file to the database, which traces through the s3boto3 lib.

The exact source of this error is difficult to suss out.

But let's state the question plainly, in Python 3, on Django 2.2.x, what is the correct way to take a csv file created with the csv lib from Python, and save that into a Django FileField backed by Amazon S3?

This question, and my approach, is inspired by this entry on SO Django - how to create a file and save it to a model's FileField? - however, given the age of the answer, some detail relevant to newer versions of Django appear to have been left out? Difficult to tell.

Example code producing the error in question, truncated for privacy and relevance

def campaign_to_csv_string(campaign_id):
    csv_string = io.StringIO()

    campaign = Campaign.objects.get(pk=campaign_id)
    checklist = campaign.checklist

    completed_jobs = JobRecord.objects.filter(appointment__campaign=campaign)

    writer = csv.writer(csv_string)

    # A bunch of writing to the writer here

    # string looks good at this point

    return csv_string.getvalue()

calling function

    csv_string = campaign_to_csv_string(campaign_report.campaign.pk)

    campaign_report.last_run = datetime.datetime.now()

    campaign_report.report_file.save(str(campaign_report_pk) + '.report', ContentFile(csv_string))

    campaign_report.processing = False

    campaign_report.save()

My guess here is that s3boto3 is taking issue with ContentFile but the debugging information sent back to me gives me no clear path forward.

edit

Stack trace by request

TypeError: Unicode-objects must be encoded before hashing
  File "celery/app/trace.py", line 385, in trace_task
    R = retval = fun(*args, **kwargs)
  File "celery/app/trace.py", line 648, in __protected_call__
    return self.run(*args, **kwargs)
  File "main/tasks.py", line 94, in produce_basic_campaign_report
    campaign_report.report_file.save(str(campaign_report_pk) + '.report', csv_file)
  File "django/db/models/fields/files.py", line 87, in save
    self.name = self.storage.save(name, content, max_length=self.field.max_length)
  File "django/core/files/storage.py", line 52, in save
    return self._save(name, content)
  File "storages/backends/s3boto3.py", line 491, in _save
    self._save_content(obj, content, parameters=parameters)
  File "storages/backends/s3boto3.py", line 506, in _save_content
    obj.upload_fileobj(content, ExtraArgs=put_parameters)
  File "boto3/s3/inject.py", line 621, in object_upload_fileobj
    ExtraArgs=ExtraArgs, Callback=Callback, Config=Config)
  File "boto3/s3/inject.py", line 539, in upload_fileobj
    return future.result()
  File "s3transfer/futures.py", line 106, in result
    return self._coordinator.result()
  File "s3transfer/futures.py", line 265, in result
    raise self._exception
  File "s3transfer/tasks.py", line 126, in __call__
    return self._execute_main(kwargs)
  File "s3transfer/tasks.py", line 150, in _execute_main
    return_value = self._main(**kwargs)
  File "s3transfer/upload.py", line 692, in _main
    client.put_object(Bucket=bucket, Key=key, Body=body, **extra_args)
  File "botocore/client.py", line 357, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "botocore/client.py", line 642, in _make_api_call
    request_signer=self._request_signer, context=request_context)
  File "botocore/hooks.py", line 360, in emit_until_response
    return self._emitter.emit_until_response(aliased_event_name, **kwargs)
  File "botocore/hooks.py", line 243, in emit_until_response
    responses = self._emit(event_name, kwargs, stop_on_response=True)
  File "botocore/hooks.py", line 211, in _emit
    response = handler(**kwargs)
  File "botocore/handlers.py", line 212, in conditionally_calculate_md5
    calculate_md5(params, **kwargs)
  File "botocore/handlers.py", line 190, in calculate_md5
    binary_md5 = _calculate_md5_from_file(body)
  File "botocore/handlers.py", line 204, in _calculate_md5_from_file
    md5.update(chunk)
M. Ryan
  • 6,973
  • 11
  • 52
  • 76

1 Answers1

3

The csv string needs to be encoded as bytes when instantiating the ContentFile

The error can be reproduced this way:

from django.core.files.base import ContentFile
from botocore.handlers import _calculate_md5_from_file

_calculate_md5_from_file(ContentFile('throws error'))
TypeError: Unicode-objects must be encoded before hashing.

content isn't internal converted to bytes unless it is a gzip mimetype or explicitly compressed. https://github.com/jschneier/django-storages/blob/1.7.2/storages/backends/s3boto.py#L417

_calculate_md5_from_file is expecting a file containing bytes and this is the same for the underlying boto3 s3 client put_object method.

I suggest encoding csv_string as bytes.

    campaign_report.report_file.save(
        str(campaign_report_pk) + '.report', 
        ContentFile(
            csv_string.encode()
        )
    )
Oluwafemi Sule
  • 36,144
  • 1
  • 56
  • 81