Webpage charcode extraction on App Engine with Async UrlFetch

Question

There ave been many discussions about determining the character encoding of text files, and webpages.

For webpages it seems the best/simple way is to use a library which takes a Url as the input and returns the properly encoded string. Since the library is fetching the document it can use the HTTP Header to help determine the encoding.

If we want to fetch documents using the UrlFetch, specifically the async api, whats the best approach, library? for determining the encoding.

Are there any libraries that integrate (or could easily modified to integrate) with async urlfetch?

score 0 · Answer 1 · answered Apr 19 '12 at 09:45

0

With URLFetch you will get a HTTPResponse where you can use getHeaders(), to get a list of headers. Look for Content-Type, for web pages it should be something like text/html; charset=UTF-8, where charset is your charset encoding.

answered Apr 19 '12 at 09:45

Peter Knego

79,991
11
123
154

Webpage charcode extraction on App Engine with Async UrlFetch

1 Answers1