Questions tagged [utf-16]

UTF-16 is a character encoding that represents Unicode code points using either 2 or 4 bytes per character.

UTF-16 is a character encoding that describes unicode code points in byte sequences of either two or four bytes. It is therefore a variable-width character encoding.

The algorithm for encoding code points as UTF-16 is described in RFC 2781.

There are three flavors of UTF-16, little-endian, big-endian and with BOM (see endianness).

Related tags

The unicode character set it serializes
Other UTFs: utf-8 utf-16, utf-32, rarely used: utf-7 utf-1 utf-18 utf-36

1193 questions

641

votes

14 answers

UTF-8, UTF-16, and UTF-32

What are the differences between UTF-8, UTF-16, and UTF-32? I understand that they will all store Unicode, and that each uses a different number of bytes to represent a character. Is there an advantage to choosing one over the other?

unicode utf-8 utf-16 utf utf-32

asked Jan 30 '09 at 17:05

user60456

483

votes

9 answers

What are Unicode, UTF-8, and UTF-16?

What's the basis for Unicode and why the need for UTF-8 or UTF-16? I have researched this on Google and searched here as well, but it's not clear to me. In VSS, when doing a file comparison, sometimes there is a message saying the two files have…

unicode encoding utf-8 utf-16

asked Feb 11 '10 at 00:12

SoftwareGeek

15,234
19
61
78

187

votes

7 answers

What is a "surrogate pair" in Java?

I was reading the documentation for StringBuffer, in particular the reverse() method. That documentation mentions something about surrogate pairs. What is a surrogate pair in this context? And what are low and high surrogates?

java unicode utf-16 surrogate-pairs

asked May 05 '11 at 19:21

Raymond

2,004
2
13
10

170

votes

10 answers

Can I make git recognize a UTF-16 file as text?

I'm tracking a Virtual PC virtual machine file (*.vmc) in git, and after making a change git identified the file as binary and wouldn't diff it for me. I discovered that the file was encoded in UTF-16. Can git be taught to recognize that this file…

git unicode character-encoding diff utf-16

asked Apr 22 '09 at 15:51

skiphoppy

97,646
72
174
218

151

votes

5 answers

Difference between UTF-8 and UTF-16?

Difference between UTF-8 and UTF-16? Why do we need these? MessageDigest md = MessageDigest.getInstance("SHA-256"); String text = "This is some text"; md.update(text.getBytes("UTF-8")); // Change this to "UTF-16" if needed byte[] digest =…

java unicode utf-8 utf-16 utf

asked Jan 11 '11 at 07:38

theJava

14,620
45
131
172

109

votes

7 answers

Convert UTF-8 with BOM to UTF-8 with no BOM in Python

Two questions here. I have a set of files which are usually UTF-8 with BOM. I'd like to convert them (ideally in place) to UTF-8 with no BOM. It seems like codecs.StreamRecoder(stream, encode, decode, Reader, Writer, errors) would handle this. But I…

python utf-8 utf-16 byte-order-mark

asked Jan 17 '12 at 16:37

timpone

19,235
36
121
211

votes

4 answers

Deprecated header replacement

A bit of foreground: my task required converting UTF-8 XML file to UTF-16 (with proper header, of course). And so I searched about usual ways of converting UTF-8 to UTF-16, and found out that one should use templates from . But now when it…

c++ utf-8 c++17 utf-16 codecvt

asked Mar 22 '17 at 08:32

login_not_failed

1,121
2
11
19

votes

5 answers

What's the point of UTF-16?

I've never understood the point of UTF-16 encoding. If you need to be able to treat strings as random access (i.e. a code point is the same as a code unit) then you need UTF-32, since UTF-16 is still variable length. If you don't need this, then…

utf-8 character-encoding utf-16 utf utf-32

asked Mar 13 '11 at 20:28

dsimcha

67,514
53
213
334

votes

5 answers

Difference between Big Endian and little Endian Byte order

What is the difference between Big Endian and Little Endian Byte order ? Both of these seem to be related to Unicode and UTF16. Where exactly do we use this?

unicode utf-16 endianness

asked Mar 31 '09 at 15:37

web dunia

9,381
18
52
64

votes

3 answers

Why does .net use the UTF16 encoding for string, but uses UTF-8 as default for saving files?

From here Essentially, string uses the UTF-16 character encoding form But when saving vs StreamWriter : This constructor creates a StreamWriter with UTF-8 encoding without a Byte-Order Mark (BOM), I've seen this sample (broken link…

c# .net string utf-8 utf-16

asked Feb 18 '13 at 17:35

Royi Namir

144,742
138
468
792

votes

10 answers

grepping binary files and UTF16

Standard grep/pcregrep etc. can conveniently be used with binary files for ASCII or UTF8 data - is there a simple way to make them try UTF16 too (preferably simultaneously, but instead will do)? Data I'm trying to get is all ASCII anyway (references…

unicode grep utf-16

asked Sep 20 '10 at 15:25

taw

18,110
15
57
76

votes

3 answers

Byte and char conversion in Java

If I convert a character to byte and then back to char, that character mysteriously disappears and becomes something else. How is this possible? This is the code: char a = 'È'; // line 1 byte b = (byte)a; // line 2 char c =…

java encoding unicode utf-16

asked Jul 28 '13 at 20:38

user1883212

7,539
11
46
82

votes

2 answers

Unicode in C++11

I've been doing a bit of reading around the subject of Unicode -- specifically, UTF-8 -- (non) support in C++11, and I was hoping the gurus on Stack Overflow could reassure me that my understanding is correct, or point out where I've misunderstood…

c++ c++11 unicode utf-8 utf-16

asked Aug 11 '14 at 17:56

Tristan Brindle

16,281
4
39
82

votes

5 answers

Java Unicode String length

I am trying hard to get the count of unicode string and tried various options. Looks like a small problem but struck in a big way. Here I am trying to get the length of the string str1. I am getting it as 6. But actually it is 3. moving the cursor…

java string utf-8 utf-16 unicode-string

asked Apr 11 '13 at 11:47

user1611248

votes

3 answers

Manually converting unicode codepoints into UTF-8 and UTF-16

I have a university programming exam coming up, and one section is on unicode. I have checked all over for answers to this, and my lecturer is useless so that’s no help, so this is a last resort for you guys to possibly help. The question will be…

unicode utf-8 utf-16

asked Jun 04 '11 at 23:41

RSM

14,540
34
97
144

2 3

…

79 80 Next