15

I am sending a file to a server as an octet-stream, and I need to specify the filename in the header:

String filename = "«úü¡»¿.doc"
URL url = new URL("http://www.myurl.com");
HttpURLConnection conn = (HttpURLConnection) url.openConnection();
conn.setRequestMethod("POST");
conn.addRequestProperty("Accept", "application/json; charset=UTF-8");
conn.addRequestProperty("Content-Type", "application/octet-stream; charset=UTF-8");
conn.addRequestProperty("Filename", filename);
// do more stuff here

The problem is, some of the files I need to send have filenames that contain non-ASCII characters. I have read that you cannot send non-ASCII text in an HTTP header.

My questions are:

  1. Can you send non-ASCII text in an HTTP header?
  2. If you can, how do you do this? The code above does not work when filename contains non-ASCII text. The server responds with "Bad Request 400".
  3. If you cannot, what is the typical way to get around this limitation?
Bruno Rohée
  • 3,436
  • 27
  • 32
guest99
  • 151
  • 1
  • 1
  • 3

3 Answers3

15

You cannot use non ASCII character in HTTP headers, see the RFC 2616. URI are themselves standardized by RFC 2396 and don't permit non-ASCII either. The RFC says :

The URI syntax was designed with global transcribability as one of its main concerns. A URI is a sequence of characters from a very limited set, i.e. the letters of the basic Latin alphabet, digits, and a few special characters.

In order to use non ASCII characters in URI you need to escape them using the %hexcode syntax (see section 2 of RFC 2396).

In Java you can do this using the java.net.URLEncoder class.

2020 edit: RFC 2616 has been updated and the relevant section on header syntax is now at https://www.rfc-editor.org/rfc/rfc7230#section-3.2

 header-field   = field-name ":" OWS field-value OWS

 field-name     = token
 field-value    = *( field-content / obs-fold )
 field-content  = field-vchar [ 1*( SP / HTAB ) field-vchar ]
 field-vchar    = VCHAR / obs-text

 obs-fold       = CRLF 1*( SP / HTAB )
                ; obsolete line folding
                ; see Section 3.2.4

Where VCHAR is defined in https://www.rfc-editor.org/rfc/rfc7230#section-1.2 as "any visible [USASCII] character". With the [USASCII] reference being

[USASCII]     American National Standards Institute, "Coded Character
              Set -- 7-bit American Standard Code for Information
              Interchange", ANSI X3.4, 1986.

The standards are still very clear, HTTP header are still US-ASCII ONLY

Community
  • 1
  • 1
Bruno Rohée
  • 3,436
  • 27
  • 32
  • Hmmm, still not working. I did: conn.addRequestProperty("Filename", URLEncoder.encode(filename)); – guest99 Mar 09 '11 at 21:59
0

This might help: HTTP headers encoding/decoding in Java

Community
  • 1
  • 1
Peter Knego
  • 79,991
  • 11
  • 123
  • 154
-2

Actually, you can use non-ASCII characters in header (see RFC 2616):

   message-header = field-name ":" [ field-value ]
   field-name     = token
   field-value    = *( field-content | LWS )
   field-content  = <the OCTETs making up the field-value
                    and consisting of either *TEXT or combinations
                    of token, separators, and quoted-string>

   TEXT           = <any OCTET except CTLs,
                    but including LWS>

   CTL            = <any US-ASCII control character
                    (octets 0 - 31) and DEL (127)>

   LWS            = [CRLF] 1*( SP | HT )

   CRLF           = CR LF

   CR             = <US-ASCII CR, carriage return (13)>

   LF             = <US-ASCII LF, linefeed (10)>

   SP             = <US-ASCII SP, space (32)>

   HT             = <US-ASCII HT, horizontal-tab (9)>
Ben Mosher
  • 13,251
  • 7
  • 69
  • 80
Maksim
  • 21
  • 2
  • 2
    RFC 2616 is saying that you can ONLY use US-ASCII in HTTP headers. Other characters have to be encoded. – saille May 18 '11 at 23:24