2

I want to read a webpage A in ISO-8859-1 charset, according to the browser, and return the content in UTF-8 as a content of the webpage B.

This is: I want to show the content of the page A in the same charset that I use to show the rest of the page B, that is UTF-8.

How do I do this in java/groovy?

thanks in advance

user2427
  • 7,842
  • 19
  • 61
  • 71
  • seems related to http://stackoverflow.com/questions/1466184/convert-ansi-characters-to-utf-8-in-java and possibly http://stackoverflow.com/questions/655891/converting-utf-8-to-iso-8859-1-in-java-how-to-keep-it-as-single-byte – tim_yates Jul 13 '10 at 22:00

2 Answers2

3

In Groovy you could write something like this:

def source = new URL("http://www.google.com").getText("ISO-8859-1")
def target = new String(source.getBytes("UTF-8"), "UTF-8")
Christoph Metzendorf
  • 7,968
  • 2
  • 31
  • 28
  • 1
    I don't get the 2nd line. According to the Groovy API, `source` will be a (UTF-16) `java.lang.String`. You convert it from a string to a UTF-8 encoded byte array and back to a (UTF-16 encoded) string again. – McDowell Jul 14 '10 at 16:25
1

You don't say what stack you're building on or how you're accessing the content, but the general mechanism for such a transcoding operation is to use UTF-16 as an intermediary; that is, convert ISO-8859-1 bytes to UTF-16 chars to UTF-8 bytes.

You could use InputStreamReader (with the an ISO-8859-1 Charset), then write bytes via OutputStreamWriter (with a UTF-8 Charset).

Some APIs provide encoding operations as part of their I/O classes (e.g. ServletResponse.getWriter()).

I'm ignoring any need to parse and transform the data, which is a whole other can of worms.

McDowell
  • 107,573
  • 31
  • 204
  • 267