0

I would like to understand this problem i have been having. Im parsing an html source page and displaying the content i want in a list view in android. I parse the page using this command.

  doc = Jsoup.connect(myURL).get();

Symbols such as é or “ ” show up as �. I understand they are not being recognized by the encoding mechanism but is it because of jsoup or android? Android default encoding im using is Utf-8 should it not support that? If it should not how and what should i change it to? Thank you for you help.

Jonh Smith
  • 65
  • 1
  • 9

2 Answers2

0

é in ISO-8859-1 (extend ASCII) is the value 233 but in UTF-8 it is the value 195 folowed by 169.

You need to know in what encoding the caracters are saved in because only the values are saved and then interpreted.

Dominique Fortin
  • 2,212
  • 15
  • 20
0

Thank you guys for the help. making the jsoup call like this :

 Document document = Jsoup.parse(new URL(url).openStream(), "ISO-8859-1", url);

was the way to go i then had to find out the real encoding of the webpage in chrome you can find it in 'more tools' and in my case it was windowns-1252. One line of code solved the problem:

 doc = Jsoup.parse(new URL(url).openStream(), "windows-1252", url);
Jonh Smith
  • 65
  • 1
  • 9