3

How would I properly generate a "javax.ws.rs.core.Response" (to be returned) that supports Chinese character encoding within an Excel file?

To clarify, i have a file (CSV excel) which contains some Chinese content, and I need to return a javax response which then displays the Chinese characters in the document properly (on the client side).

Currently I'm doing the following:

return Response.status( 200 )
        .header( "content-disposition", 
                 "attachment;filename=SampleCSV.csv;charset=Unicode" )
        .entity( result )
        .build();

but when this response is built and returned to the client side (and a popup screen is displayed asking to download the file), the Chinese content of the excel file is gobbly gooed.

Any suggestion will be highly appreciated.

Jonathan Hall
  • 75,165
  • 16
  • 143
  • 189
Mohammad Najar
  • 2,009
  • 2
  • 21
  • 31

2 Answers2

5

The RFC that defines the content-disposition header doesn't mention a charset clause

Try also adding a proper content-type header to the response:

.header("Content-Type", "text/csv; charset=utf-8")

Be sure to use utf-8, and not unicode. If that works, then you can remove the charset clause from the content-disposition header.

Community
  • 1
  • 1
Sean Reilly
  • 21,526
  • 4
  • 48
  • 62
3

You specify charset=Unicode, which is not valid because Unicode is not a single encoding. It's a character set with a family of encodings. UTF-8 and UTF-16 are commonly-used encodings.

You can control the response header, to affect how the browser/client interprets the response, using the @Produces annotation. I've seen different opinions about whether this works:

I'm fairly certain that this only changes the encoding declared in the response headers; it doesn't change the encoding that's actually used to convert the response string into bytes to send over the network. These two must match, otherwise the browser/client will misinterpret the response, because it believes that you used a different encoding than you actually did.

If you return a java.lang.String object, JAx-RS uses a system default encoding to convert it to a byte stream. If the JAX-RS server is running on Unix this is UTF-8, which usually works well, but on Windows it's something weird that doesn't.

Therefore you should force it to use a specific encoding, by wrapping the result object in an OutputStreamWriter that specifies the encoding. This prevents JAX-RS from using the default conversion.

To be specific, if result is a java.lang.String object in your code, you may need to create an OutputStreamWriter around it that specifies an encoding, such as UTF-8, to affect byte stream that JAX-RS writes to the network. I haven't tested this code, but it might work:

.entity(new OutputStreamWriter(result, "UTF-8"))

I had this problem with Tika, which sends a StreamingOutput instead of a Response, and constructs it with a default OutputStreamWriter, which uses the system's default encoding instead of something predictable.

I modified Tika to specify the encoding when constructing the OutputStreamWriter, and added a charset to the @Produces annotation, and that fixed it for me.

Community
  • 1
  • 1
qris
  • 7,900
  • 3
  • 44
  • 47