0

I have written a web application in flask. For one of the endpoints, I am taking som elements from request.form, making it into a formatted line using a template and format() and then writing it to a file. That works fine - as long as the contents are ascii characters. As this web applications is to handle text in Norwegian, it also must handle strings containing the letters æøåÆØÅ. In that case, the application fails out with a "UnicodeEncodeError 'ascii' codec can't encode character '\xe6' in position 78: ordinal not in range(128)" (if it contains an æ) in the line file.write(sentence)

It seems like python is trying to encode my string from ascii to unicode, but it fails as it is already UTF-8.

How can I tell python that the string I have already is UTF-8?

I have

# -*- coding: utf-8 -*-

as the first line of the file.

The relevant code (slightly abbreviated)

comment=request.form['comment']
author=request.form['author']
service=request.form['service']
host=request.form['host']
now=int(time.time())
rawsentence="[{}] ACKNOWLEDGE_SVC_PROBLEM;{};{};2;1;1;{};{}"
sentence=rawsentence.format(now,host,service,author,comment)        
filename=<SOME FILE>
with open(filename,'w') as file:
        file.write(sentence)
MortenSickel
  • 2,118
  • 4
  • 26
  • 44
  • Where exactly does the error originate? Show a traceback. Perhaps you want `with open(filename, 'w', encoding='utf-8')`…? It looks like your form data is not UTF-8 encoded either though but rather Latin-1, so you need to change something in your HTML/HTTP headers too to ensure the browser sends the data as UTF-8. – deceze Jan 15 '20 at 09:20
  • The encoding= did the trick. Please rewrite as an aswer so I can accept it... (But the data were utf-8) – MortenSickel Jan 15 '20 at 09:49
  • Morten, when you have a `str` object in Python 3, it's not UTF-8 anymore – it's a decoded string with codepoints. (In fact, it is internally represented with ASCII, UTF-16 or UTF-32, depending on the data, but this is abstracted away from the user.) No matter what the text was originally encoded with(UTF-8 or something else), it is now decoded, and you can encode it again (using UTF-8 or another codec) for writing it to disk or sending it over the network. – lenz Jan 15 '20 at 10:53

1 Answers1

0

Just set the encoding when opening the file

with open(filename,'w', encoding='utf-8') as file:
        file.write(sentence)
clubby789
  • 2,543
  • 4
  • 16
  • 32