1

I try to use post data in Big5 and get the like:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html lang="zh-TW">

The java statement is like:

Document docs = Jsoup.connect(param)
                     .timeout(30000)
                     .postDataCharset("Big5")
                     .data("syear","104")
                     .data("smonth","6")
                     .data("sday","30")
                     .data("eyear","104")
                     .data("emonth","7")
                     .data("eday","17")
                     .data("SectNO", "不限科別")
                     .data("EmpNO", "不限醫生")
                     .post();

How to set charset for sending data to get response?

imei
  • 31
  • 5
  • Possible duplicate of [JSoup character encoding issue](http://stackoverflow.com/questions/7703434/jsoup-character-encoding-issue) – Stephan Jan 29 '16 at 10:19

1 Answers1

0

Explication

As of Jsoup 1.8.3, postDataCharset() sets the charset of data posted. This charset isn't reused when it comes to parse the data read.

Instead, Jsoup tries to find somehow a meta http-equiv specifying the charset. If it can't find, it assumes by default that the charset is UTF-8. In your case, this assumption is wrong.

Workaround

To workaround this, don't let Jsoup guess the data encoding for you. Here is how to do it:

// Let Jsoup fetch the data
Response res = Jsoup.connect(param)         //
                 .timeout(30000)            //
                 .postDataCharset("Big5")   //
                 .data("syear", "104")      //
                 .data("smonth", "6")       //
                 .data("sday", "30")        //
                 .data("eyear", "104")      //
                 .data("emonth", "7")       //
                 .data("eday", "17")        //
                 .data("SectNO", "不限科別") //
                 .data("EmpNO", "不限醫生")  //
                 .execute();

// Now, we tell it explicitly which encoding to use
Document docs = Jsoup.parse(
                 new String(res.bodyAsBytes(), "Big5"), //
                 param //
);
Community
  • 1
  • 1
Stephan
  • 41,764
  • 65
  • 238
  • 329