0

I am having difficulty outputting data in UTF-8 format. I have a test case set up where data I am reading from an input file contains a British pound symbol (Hex C2A3). When I write it out on Linux, I get valid UTF-8 (C2A3). On windows, I only get HEX A3.

I tried using a PrintStream and specifying the character set as "UTF-8". No luck. I tried many other streams with no luck until I finally tried a DataOutputStream. I used the "write()" method which took a byte array as a parameter. I needed to output a string, so I called "myString.getBytes("UTF-8")".

I ended up with code like:

dataOutputStream.write(myString.getBytes("UTF-8"));

This works properly on both systems; Windows 7 and Linux.

I am trying to understand why this worked and convince myself my solution is correct. Does it come down to system Locale's? Linux defaults to en_US.utf-8. While all I could specify in Windows was just "en_US". So when the outputstream attempted to get data from the string, the string was sending its data based upon the locale?

1 Answers1

0

Or are you using FileOutputStream and there it matters the character encoding or DataOutputStream where you write binary. You should do a research too, but look at here please

Community
  • 1
  • 1
  • Thanks. It was the OutputStreamWriter I was missing. I was stuck on using a PrintWriter and specifying the charset there, but that just wasn't working. – user1315531 Sep 03 '12 at 23:41