0

There ave been many discussions about determining the character encoding of text files, and webpages.

For webpages it seems the best/simple way is to use a library which takes a Url as the input and returns the properly encoded string. Since the library is fetching the document it can use the HTTP Header to help determine the encoding.

  1. http://htmlcleaner.sourceforge.net/download.
  2. How do you Programmatically Download a Webpage in Java
  3. character encoding in a web page using java
  4. http://docs.oracle.com/javase/6/docs/technotes/guides/intl/encoding.doc.html

If we want to fetch documents using the UrlFetch, specifically the async api, whats the best approach, library? for determining the encoding.

Are there any libraries that integrate (or could easily modified to integrate) with async urlfetch?

Community
  • 1
  • 1
Nick Siderakis
  • 1,961
  • 2
  • 21
  • 39

1 Answers1

0

With URLFetch you will get a HTTPResponse where you can use getHeaders(), to get a list of headers. Look for Content-Type, for web pages it should be something like text/html; charset=UTF-8, where charset is your charset encoding.

Peter Knego
  • 79,991
  • 11
  • 123
  • 154