I am trying to read in HTML from Chinese websites and get their <title>
value. All the websites with UTF-8 encoding works fine, but not for GB2312 websites (for example, m.39.net, which shows 39������_�й����ȵĽ����Ż���վ
instead of 39健康网_中国领先的健康门户网站
).
Here is the code I use to accomplish that:
URL url = new URL(urlstr);
URLConnection connection = url.openConnection();
inputStream = connection.getInputStream();
String content = IOUtils.toString(inputStream);