11

I want to write a string to a file which expects an 8-bit US ASCII encoding.

Which encoding scheme should I use for the method String.getBytes(encodingScheme)?

Thanks.

Saurabh Gokhale
  • 53,625
  • 36
  • 139
  • 164
Martin08
  • 20,990
  • 22
  • 84
  • 93

3 Answers3

13

ASCII is a 7bit encoding scheme, there is no "8-bit ASCII".

However, many encodings are ASCII-compatible, and some are 8bit transparent (i.e. every binary series maps to a valid character string, and vice versa, useful if you're sending binary data over a character channel without encoding it in base64 or so). If you just want to be ASCII-compatible, UTF-8 is the best choice; if you need 8 bit transparency, ISO-8859-1.

Note that the above advice is only useful if you want to transport ASCII-only strings or 8bit binary ones. In most cases, you actually want to transfer arbitrary strings, and there's no way around finding the proper encoding for these.

phihag
  • 278,196
  • 72
  • 453
  • 469
  • Many thanks. WHat's the difference between "8-bit compatible" and "8-bit transparent"? – Martin08 Jul 03 '11 at 19:42
  • @Martin08 Added an explanation to the answer. I meant (7bit) ASCII-compatible – phihag Jul 03 '11 at 19:44
  • @Ted Hopp You got 90% of the way to the reason. Imagine a channel **above** the text channel, and you're there. – phihag Jul 03 '11 at 19:46
  • @phihag - Sorry, I deleted my comment after seeing your response to Martin08. Why would you send binary data to a process that expects character data (or vice versa)? There's no reason to prefer ISO 8859-1 over any other encoding unless you know what the other end is using. – Ted Hopp Jul 03 '11 at 19:49
  • @Ted Hopp Maybe because said process was never designed to handle binary data, but it's a useful feature? Maybe because the data you're sending *contains* the charset and you don't want to parse it yourself? (think an HTML document with a `` tag). You're right in as there's no reason to prefer ISO-8859-1 over, say, ISO-8859-15, but in these cases you should *not* use UTF-8, as certain binary sequences are not valid UTF-8. – phihag Jul 03 '11 at 19:52
  • 1
    @Francois - I don't think that works. Although every 8-bit value is a legal character in ISO 8859-1, some of the characters (like CR) have special meaning in MIME messages. Plus, MIME is a 7-bit protocol. So the channel itself isn't transparent. I'm not even sure what it means to use ISO 8859-1 "to send binary content" without first using some binary-to-text encoding scheme. While Base64 is about 75% efficient, some other encoding schemes are much better (e.g., yEnc is ~98% efficient). Also, if one uses 8BITMIME, then there's no need for any character encoding at all to send binary data. – Ted Hopp Oct 28 '16 at 17:30
  • @Ted Hopp - Absolutely, ISO8859-1 is not equivalent to 8bit encoding in a MIME message. I'm withdrawing my comment. – Francois Dec 07 '16 at 17:50
4

US-ASCII

The list of encodings is here: https://docs.oracle.com/javase/8/docs/technotes/guides/intl/encoding.doc.html

Jesse Barnum
  • 6,507
  • 6
  • 40
  • 69
3

There's no such thing as "8-bit ASCII". There are several 8-bit "extensions" to ASCII, including ISO-8859-1 and Windows-1252. Those are probably the most common ones, but they're not the same. You really need to find out exactly which encoding is expected.

Both of those names are available via those names in Java - at least they are on my JDK installation. (You may find that Windows-1252 isn't available on a Linux installation, for example.)

Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
  • There is the extended ascii, i.e. at the bottom of this page: http://www.ascii-code.com/ There are also lots of 8-bit variants of ascii: https://en.wikipedia.org/wiki/ASCII#8-bit – Tamas Rev May 25 '16 at 14:01
  • @Tamas: There's no one encoding properly named "extended ASCII" - there are just several 8-bit extensions, each of which is probably called "extended ASCII" somewhere. In other words, if something identified itself as using "extended ASCII" it would be almost useless. – Jon Skeet May 25 '16 at 14:07