0

I have this strange bug that I have been debugging for like two days. My server constructs a string representing a HTML file and sent it over to another API to get transferred into a PDF file. However, all the foreign characters like Chinese will be converted to question marks.

I view these variables in debug mode, they all look fine, but when it is send it will be changed to question marks. I tried

new String(originalString.getBytes(), StandardCharsets.UTF_8) The value returned by this constructor makes all the chinese character question mark as well.

I can view the originalString normally in debug mode with breakpoints, but I can't log them or System.out.println() them. The printed string will turn all foreign character into question marks.

I tried to look into the String, the correct String have slot 364 as a Chinese character with code 20000 something, but when they are converted, they get changed to ? with code 63

I do have console Charset set to UTF-8

This is how I am sending over to the other API FYI This is old legacy code written by some random guy. Apologize for horrible style

String htmlData = princeServices.createPdf(pdfData, teacher, connection);
htmlData = tidyHTML(htmlData, pdfData); // This is the html data, can be viewed in debug mode

URL urlobj = null;
if ("Landscape".equalsIgnoreCase(pdfData.getPrintLayout())) {
     urlobj = new URL(princePath + "Landscape");
} else {
     urlobj = new URL(princePath);
}
            HttpURLConnection conn = (HttpURLConnection) urlobj.openConnection();
            HttpURLConnection.setFollowRedirects(true);
            conn.setRequestMethod("POST");
            conn.setDoOutput(true);
            conn.setRequestProperty("Authorization", "Basic " + printBase64Binary(/* password here */ );
            conn.setRequestProperty("Content Length", Integer.toString(htmlData.length()));
            outToPrinceServer = new OutputStreamWriter(conn.getOutputStream());
            outToPrinceServer.write(htmlData);
            outToPrinceServer.flush();
            resp.setContentType("application/pdf");
            resp.addHeader("Content-Disposition", "attachment; filename=" + "planbook.pdf");
            ServletOutputStream outToBrowser = resp.getOutputStream();
            ByteStreams.copy(conn.getInputStream(), outToBrowser);

Save my life pls :)

Bobby
  • 1,511
  • 1
  • 15
  • 24
  • 2
    Strings don't have an encoding. An encoding is what is used to determine what **byte(s)** a **character** will be transformed to/from. It's most likely being corrupted when you're sending it to the other API. How are you doing that? – Kayaman May 24 '17 at 16:29
  • @Kayaman added to the question – Bobby May 24 '17 at 16:57
  • @Kayaman A outputStream.getEncoding() gives me Cp1252 – Bobby May 24 '17 at 17:12
  • I'm too drunk to help you all the way, but you must **always** use the constructor for `OutputStreamWriter` that takes a charset as well, you don't want to rely on the default being the same everywhere. Probably add a [charset header](https://www.w3.org/International/articles/http-charset/index) in there as well. – Kayaman May 24 '17 at 17:19

1 Answers1

0

System.out.println() is using a "local" console, which means that depending on your OS language, it will not display special characters.

See this answer for more info : https://stackoverflow.com/a/27218881/967768

Demogorii
  • 656
  • 5
  • 16