1

I'm using python requests 2.2.1, and trying to post a request with a custom header.

I'm creating my own header, myheader, like this:

myheader = {'name' : myvalue }

The thing is myvalue is a unicode object. I'm not encoding it to a byte string, just directly putting it in the myheader dictionary.

and when I do:

r = requests.post(myhost, headers=myheader)

I get an exception:

UnicodeEncodeError: 'ascii' codec can't encode characters in position 17-18: ordinal not in range(128)

And I guess I could get rid of it by doing myvalue.encode('utf8') before putting it in the header dictionary - but my question is, is it illegal then to put a unicode object in the header? I ask because the response can contain unicode objects with no problem, so why can I not put one in the header?

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
patchwork
  • 1,161
  • 3
  • 8
  • 23

1 Answers1

1

Headers are not unicode data, no. They are not part of the POST body (which is encoded for you as needed, and can otherwise contain any binary data).

The vast majority of HTTP headers encode information that only require the ASCII characterset anyway. For example, an Accept-Language header only contains ISO-639 language codes, with optional ISO-3166 country codes, plus q, ;, = and numeric information.

It is generally accepted that HTTP headers may also contain Latin-1 (ISO-8859-1) characters (so up to Unicode U+00FF); specifically the HTTP 1.1 Warning header specification uses Latin-1 as the default. If you need to encode text in a header outside of the Latin-1 range, encode the text following RFC 2047. In Python, you can do so with email.header.Header() objects:

from email.header import Header

myheader = {'name': str(Header(u'Some unicode value', 'utf-8'))}
Community
  • 1
  • 1
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • 1
    is there an advantage to using `email.header` over just doing `someunicodevalue.encode('utf8')`? – patchwork Jun 06 '16 at 13:55
  • @patchwork: it is standards-compliant, meaning that it should work with *any* HTTP server and not get rejected because the header can't be decoded as Latin-1. – Martijn Pieters Jun 06 '16 at 13:56
  • I have a client, written in python using Flask, that receives the request. But when I retrieve the headers using the flask definition of an HTTP request, I am given unicode objects. Not a big problem, but confusing in the light of your answer (which I have accepted) – patchwork Jun 06 '16 at 14:41
  • @patchwork: yes, the Werkzeug request objects automatically decode headers to Unicode values, by decoding as Latin-1. Or rather, headers are left as passed in from the WSGI environment, which on Python 3, means unicode values decoded from Latin-1. – Martijn Pieters Jun 06 '16 at 15:15
  • @patchwork: related: [Flask - headers are not converted to unicode?](http://stackoverflow.com/q/10124786) – Martijn Pieters Jun 06 '16 at 15:26