How to convert UTF8 string to UTF16

Question

I'm getting a UTF8 string by processing a request sent by a client application. But the string is really UTF16. What can I do to get it into my local string is a letter followed by \0 character? I need to convert that String into UTF16.

Sample received string: S\0a\0m\0p\0l\0e (UTF8).
What I want is : Sample (UTF16)

FileItem item = (FileItem) iter.next();
String field = "";
String value = "";
if (item.isFormField()) {
  try{
    value=item.getString();
    System.out.println("====" + value);
  }

A String is a sequence of characters. The encoding matters only when you transform a String to bytes and vice-versa (when writing or reading to/from a file for example). Show us some code, because what you want to achieve is not clear. — JB Nizet, Nov 16 '12 at 07:29
possible duplicate of [Encoding conversion in java](http://stackoverflow.com/questions/229015/encoding-conversion-in-java) — Has QUIT--Anony-Mousse, Nov 16 '12 at 07:39

Ted Hopp · Accepted Answer · 2012-11-16T14:30:53.027

19

The bytes from the server are not UTF-8 if they look like S\0a\0m\0p\0l\0e. They are UTF-16. You can convert UTF16 bytes to a Java String with:

byte[] bytes = ...
String string = new String(bytes, "UTF-16");

Or you can use UTF-16LE or UTF-16BE as the character set name if you know the endian-ness of the byte stream coming from the server.

If you've already (mistakenly) constructed a String from the bytes as if it were UTF-8, you can convert to UTF-16 with:

string = new String(string.getBytes("UTF-8"), "UTF-16");

However, as JB Nizet points out, this round trip (bytes -> UTF-8 string -> bytes) is potentially lossy if the bytes weren't valid UTF-8 to start with.

edited Nov 16 '12 at 14:30

answered Nov 16 '12 at 07:30

Ted Hopp

232,168
48
399
521

7

I would say that if he has already constructed a String from the bytes as if it were UTF-8, then there is a bug, and this shouldn't have been done. Every sequence of bytes is not valid UTF-8, and trying to transform random bytes (or UTF-16 bytes) into an UTF8 String is a potentially lossy process. – JB Nizet Nov 16 '12 at 07:39

score -1 · Answer 2 · edited Jun 20 '20 at 09:12

-1

I propose the following solution:

NSString *line_utf16[ENOUGH_MEMORY_SIZE];

line_utf16= [NSString stringWithFormat: @"%s", line_utf8];

ENOUGH_MEMORY_SIZE is at least twice exceeds memory used for line_utf8

I suppose memory for line_utf16 has to be dynamically or statically allocated at least twice of the size of line_utf8.

If you run into similar problem please add a couple of sentences!

edited Jun 20 '20 at 09:12

Community

1
1

answered Oct 11 '18 at 14:49

matrix3003

386
3
9

The question was about how to do the conversion in Java. – skomisa Dec 20 '19 at 20:36

How to convert UTF8 string to UTF16

2 Answers2

I propose the following solution:

Linked

Related