13

I'm trying to output a CSV file that the user could open with excel. I've encoded all string in UTF-8 but when I opened the file with excel I see jibrish. Only after converting the file to UTF-8 with BOM (using notepad++ on windows) I was able to display the content properly.

I'm following this pattern from the docs:

def render_to_csv(self, request, qs): 
  response = HttpResponse(content_type='text/csv')
  response['Content-Disposition'] = 'attachment; filename="test.csv"'

  writer = csv.writer(response, delimiter=',')

  for row in qs.values_list(*self.fields_to_export):
    writer.writerow([unicode(v).encode('utf-8') if v is not None else '' for v in row])

  return response

Where does to BOM fit into all of this ?

BTW, There are similar questions on SO but unfortunately non of them are answered.

EDIT

building on @Alastair McCormack, I ended up explicitly adding the BOM characters at the begining of the file. Only difference is i used the codecs package instead of hard coding the bytes. Feels awkward but does the trick !

import codecs

def render_to_csv(self, request, qs): 
  ... 
  response.write(codecs.BOM_UTF8)
  ...
  return response
Community
  • 1
  • 1
haki
  • 9,389
  • 15
  • 62
  • 110

4 Answers4

12

Add the UTF-8 BOM to the response object before you write your data:

def render_to_csv(self, request, qs): 
  response = HttpResponse(content_type='text/csv')
  response['Content-Disposition'] = 'attachment; filename="test.csv"'

  # BOM      
  response.write("\xEF\xBB\xBF")

  writer = csv.writer(response, delimiter=',')
  …
Alastair McCormack
  • 26,573
  • 8
  • 77
  • 100
3

StreamingHttpResponse for csv add UTF-8 BOM or \xEF\xBB\xBF

Modified from official documents

import csv
import codecs

from django.utils.six.moves import range
from django.http import StreamingHttpResponse


class Echo(object):
    def write(self, value):
        return value


def iter_csv(rows, pseudo_buffer):
    yield pseudo_buffer.write(codecs.BOM_UTF8)
    writer = csv.writer(pseudo_buffer)
    for row in rows:
        yield writer.writerow(row)


def some_streaming_csv_view(request):
    rows = (["Row {}".format(idx), str(idx)] for idx in range(65536))
    response = StreamingHttpResponse(iter_csv(rows), Echo()), content_type="text/csv")
    return response
clampist
  • 31
  • 2
2

The given answers are great but I want to give a hint how to make the official examples from the Django docs work with the UTF-8 BOM sequence of bytes at the start of the stream by only changing one line:

import itertools
import codecs 

streaming_content = itertools.chain([codecs.BOM_UTF8], (writer.writerow(row) for row in rows))
response = StreamingHttpResponse(streaming_content, content_type="text/csv")
Yannic Hamann
  • 4,655
  • 32
  • 50
-1

In my case, somehow the first BOM_UTF8 character is auto removed when returned as http response. So I had to manually add TWO BOM_UTF8 characters.

response.write(codecs.BOM_UTF8)
response.write(codecs.BOM_UTF8)
response.write(encoded_csv_content)

I don't know the cause of this, but maybe it will help someone, or someone who knows why may explain in the comment.

  • 1
    I was facing the same issue a while ago, asked a question here on SO and received this answer: https://stackoverflow.com/a/42717677/1143392 which lead me to the same "solution"... – Max Dec 16 '20 at 21:21