In Java, I have a String and I want to encode it as a byte array (in UTF8, or some other encoding). Alternately, I have a byte array (in some known encoding) and I want to convert it into a Java String. How do I do these conversions?
13 Answers
Convert from String
to byte[]
:
String s = "some text here";
byte[] b = s.getBytes(StandardCharsets.UTF_8);
Convert from byte[]
to String
:
byte[] b = {(byte) 99, (byte)97, (byte)116};
String s = new String(b, StandardCharsets.US_ASCII);
You should, of course, use the correct encoding name. My examples used US-ASCII and UTF-8, two commonly-used encodings.
-
1This method, however, will not report any problems in the conversion. This may be what you want. If not, it is recommended to use CharsetEncoder instead. – Michael Piefel Aug 17 '11 at 20:57
-
Why did you use `UTF-8` instead of `utf8` (which I always use) ? – Pacerier Jan 12 '12 at 10:54
-
7@Pacerier because [the docs for Charset](http://docs.oracle.com/javase/6/docs/api/java/nio/charset/Charset.html) list "UTF-8" as one of the standard charsets. I believe that your spelling is also accepted, but I went with what the docs said. – mcherm Jan 17 '12 at 19:44
-
There is a problem using two of this Strngs: when you compare it doesn work – gal007 Feb 19 '13 at 14:50
-
26Since JDK7 you can use StandardCharsets.UTF_8 https://docs.oracle.com/javase/7/docs/api/java/nio/charset/StandardCharsets.html#UTF_8 – Rafael Membrives Apr 15 '16 at 09:26
Here's a solution that avoids performing the Charset lookup for every conversion:
import java.nio.charset.Charset;
private final Charset UTF8_CHARSET = Charset.forName("UTF-8");
String decodeUTF8(byte[] bytes) {
return new String(bytes, UTF8_CHARSET);
}
byte[] encodeUTF8(String string) {
return string.getBytes(UTF8_CHARSET);
}

- 1,332
- 1
- 18
- 20
-
That's a good point... if performance is critical, then this would save a tiny amount of time. Only significant inside a very tight loop that isn't doing much else, but it could be helpful. – mcherm Aug 06 '10 at 15:39
-
4@mcherm: Even if the performance difference is small, I prefer using objects (Charset, URL, etc) over their string forms when possible. – Bart van Heukelom Dec 07 '10 at 09:08
-
7
-
1Regarding "avoids performing the Charset lookup for every conversion"... please cite some source. Isn't java.nio.charset.Charset built **on top** of String.getBytes and therefore has more overhead than String.getBytes? – Pacerier Jul 14 '12 at 22:43
-
2The docs do state: "The behavior of this method when this string cannot be encoded in the given charset is unspecified. The CharsetEncoder class should be used when more control over the encoding process is required." – paiego Oct 19 '13 at 20:30
-
28Note: since Java 1.7, you can use `StandardCharsets.UTF_8` for a constant way of accessing the UTF-8 charset. – Kat Jul 29 '14 at 23:27
-
String original = "hello world";
byte[] utf8Bytes = original.getBytes("UTF-8");

- 10,577
- 10
- 57
- 99

- 96,051
- 25
- 122
- 132
-
Thanks! I wrote it up again myself adding the other direction of conversion. – mcherm Sep 18 '08 at 00:18
-
1
You can convert directly via the String(byte[], String) constructor and getBytes(String) method. Java exposes available character sets via the Charset class. The JDK documentation lists supported encodings.
90% of the time, such conversions are performed on streams, so you'd use the Reader/Writer classes. You would not incrementally decode using the String methods on arbitrary byte streams - you would leave yourself open to bugs involving multibyte characters.

- 107,573
- 31
- 204
- 267
-
Can you elaborate? If my application encodes and decodes Strings in `UTF-8`, what's the concern regarding multibytes characters? – raffian Dec 03 '13 at 03:45
-
@raffian Problems can occur if you don't transform all the character data in one go. See [here](http://illegalargumentexception.blogspot.co.uk/2009/05/java-rough-guide-to-character-encoding.html#javaencoding_stringclass) for an example. – McDowell Dec 03 '13 at 09:00
My tomcat7 implementation is accepting strings as ISO-8859-1; despite the content-type of the HTTP request. The following solution worked for me when trying to correctly interpret characters like 'é' .
byte[] b1 = szP1.getBytes("ISO-8859-1");
System.out.println(b1.toString());
String szUT8 = new String(b1, "UTF-8");
System.out.println(szUT8);
When trying to interpret the string as US-ASCII, the byte info wasn't correctly interpreted.
b1 = szP1.getBytes("US-ASCII");
System.out.println(b1.toString());

- 3,619
- 34
- 43
-
9FYI, as of Java 7 you can use constants for those charset names such as [`StandardCharSets.UTF_8`](http://docs.oracle.com/javase/8/docs/api/java/nio/charset/StandardCharsets.html#UTF_8) and [`StandardCharSets.ISO_8859_1`](http://docs.oracle.com/javase/8/docs/api/java/nio/charset/StandardCharsets.html#ISO_8859_1). – Basil Bourque Jun 27 '14 at 23:20
-
Saved my day, working absolutely fine for the first solution mentioned above. – Hassan Jamil Apr 17 '18 at 08:11
-
Correction: it should be [StandardCharsets.UTF_8](http://docs.oracle.com/javase/8/docs/api/java/nio/charset/StandardCharsets.html#UTF_8) and [StandardCharsets.ISO_8859_1](http://docs.oracle.com/javase/8/docs/api/java/nio/charset/StandardCharsets.html#ISO_8859_1) (lowercase 's') – Thomas Mueller Nov 03 '22 at 08:00
As an alternative, StringUtils from Apache Commons can be used.
byte[] bytes = {(byte) 1};
String convertedString = StringUtils.newStringUtf8(bytes);
or
String myString = "example";
byte[] convertedBytes = StringUtils.getBytesUtf8(myString);
If you have non-standard charset, you can use getBytesUnchecked() or newString() accordingly.

- 8,989
- 7
- 51
- 67
-
4Note that this StringUtils from **Commons Codec**, not Commons Lang. – Arend v. Reinersdorff Feb 29 '16 at 14:08
-
Yes, bit of a gotcha! For Gradle, Maven users: *"commons-codec:commons-codec:1.10"* (at time of writing). This also comes bundled as a dependency with Apache POI, for example. Apart from that Apache Commons to the rescue, as ever! – mike rodent Mar 03 '17 at 18:38
I can't comment but don't want to start a new thread. But this isn't working. A simple round trip:
byte[] b = new byte[]{ 0, 0, 0, -127 }; // 0x00000081
String s = new String(b,StandardCharsets.UTF_8); // UTF8 = 0x0000, 0x0000, 0x0000, 0xfffd
b = s.getBytes(StandardCharsets.UTF_8); // [0, 0, 0, -17, -65, -67] 0x000000efbfbd != 0x00000081
I'd need b[] the same array before and after encoding which it isn't (this referrers to the first answer).

- 81
- 1
- 5
For decoding a series of bytes to a normal string message I finally got it working with UTF-8 encoding with this code:
/* Convert a list of UTF-8 numbers to a normal String
* Usefull for decoding a jms message that is delivered as a sequence of bytes instead of plain text
*/
public String convertUtf8NumbersToString(String[] numbers){
int length = numbers.length;
byte[] data = new byte[length];
for(int i = 0; i< length; i++){
data[i] = Byte.parseByte(numbers[i]);
}
return new String(data, Charset.forName("UTF-8"));
}

- 261
- 3
- 4
If you are using 7-bit ASCII or ISO-8859-1 (an amazingly common format) then you don't have to create a new java.lang.String at all. It's much much more performant to simply cast the byte into char:
Full working example:
for (byte b : new byte[] { 43, 45, (byte) 215, (byte) 247 }) {
char c = (char) b;
System.out.print(c);
}
If you are not using extended-characters like Ä, Æ, Å, Ç, Ï, Ê and can be sure that the only transmitted values are of the first 128 Unicode characters, then this code will also work for UTF-8 and extended ASCII (like cp-1252).

- 86,231
- 106
- 366
- 634
Charset UTF8_CHARSET = Charset.forName("UTF-8");
String strISO = "{\"name\":\"א\"}";
System.out.println(strISO);
byte[] b = strISO.getBytes();
for (byte c: b) {
System.out.print("[" + c + "]");
}
String str = new String(b, UTF8_CHARSET);
System.out.println(str);

- 5,228
- 2
- 27
- 43

- 137
- 1
- 2
- 12
Reader reader = new BufferedReader(
new InputStreamReader(
new ByteArrayInputStream(
string.getBytes(StandardCharsets.UTF_8)), StandardCharsets.UTF_8));

- 5,228
- 2
- 27
- 43

- 45
- 3
//query is your json
DefaultHttpClient httpClient = new DefaultHttpClient();
HttpPost postRequest = new HttpPost("http://my.site/test/v1/product/search?qy=");
StringEntity input = new StringEntity(query, "UTF-8");
input.setContentType("application/json");
postRequest.setEntity(input);
HttpResponse response=response = httpClient.execute(postRequest);

- 3,587
- 30
- 27
-
Does String Entity convert 'query' to utf-8 or just remember for when attaching the entity? – SyntaxRules Oct 23 '13 at 03:39
terribly late but i just encountered this issue and this is my fix:
private static String removeNonUtf8CompliantCharacters( final String inString ) {
if (null == inString ) return null;
byte[] byteArr = inString.getBytes();
for ( int i=0; i < byteArr.length; i++ ) {
byte ch= byteArr[i];
// remove any characters outside the valid UTF-8 range as well as all control characters
// except tabs and new lines
if ( !( (ch > 31 && ch < 253 ) || ch == '\t' || ch == '\n' || ch == '\r') ) {
byteArr[i]=' ';
}
}
return new String( byteArr );
}

- 1
-
2First, it's not a conversion: it's the removal of non-printable bytes. Second, it assumes that the underlying OS' default encoding is really based on ASCII for printable characters (won't work on IBM Mainframes using EBCDIC, for instance). – Isaac Oct 19 '13 at 22:34