I am having difficulty outputting data in UTF-8 format. I have a test case set up where data I am reading from an input file contains a British pound symbol (Hex C2A3). When I write it out on Linux, I get valid UTF-8 (C2A3). On windows, I only get HEX A3.
I tried using a PrintStream and specifying the character set as "UTF-8". No luck. I tried many other streams with no luck until I finally tried a DataOutputStream. I used the "write()" method which took a byte array as a parameter. I needed to output a string, so I called "myString.getBytes("UTF-8")".
I ended up with code like:
dataOutputStream.write(myString.getBytes("UTF-8"));
This works properly on both systems; Windows 7 and Linux.
I am trying to understand why this worked and convince myself my solution is correct. Does it come down to system Locale's? Linux defaults to en_US.utf-8. While all I could specify in Windows was just "en_US". So when the outputstream attempted to get data from the string, the string was sending its data based upon the locale?