There ave been many discussions about determining the character encoding of text files, and webpages.
For webpages it seems the best/simple way is to use a library which takes a Url as the input and returns the properly encoded string. Since the library is fetching the document it can use the HTTP Header to help determine the encoding.
- http://htmlcleaner.sourceforge.net/download.
- How do you Programmatically Download a Webpage in Java
- character encoding in a web page using java
- http://docs.oracle.com/javase/6/docs/technotes/guides/intl/encoding.doc.html
If we want to fetch documents using the UrlFetch, specifically the async api, whats the best approach, library? for determining the encoding.
Are there any libraries that integrate (or could easily modified to integrate) with async urlfetch?