I have a string values which is encoded in UTF-8. But also they may contain unicode chars.
For ex;
"\u0131".encoding
=> #<Encoding:UTF-8>
"\u0131" is "ı".
how can i convert all unicode chars to utf-8?
Thanks
Çağdaş
Internally, in this string all unicode chars are already represented as utf-8 bytes. Let's check it.
> "\u0131".bytes.to_a
=> [196, 177]
OK, there are two bytes, but are they UTF-8 or UTF-16 bytes? The easiest way to check is to look at binary representation. Let's iterate each byte and print them in binary radix:
>> "\u0131".each_byte {|b| print b.to_s(2)};puts
1100010010110001
=> nil
This is the binary representation of your string — as you can see, that is correct UTF-8 two-byte sequence for char 100110001
, that is, 0x0131
:
110 00100 10 110001
--- -- ← UTF-8 markers for 2-byte char
===== ====== ← bits of your char
So the answer is — do nothing. The string is already utf-8, Q.E.D.
utf-8 is an encoding for unicode characters. You don't have to convert anything, your characters are already encoded in utf-8. If they are displayed as \u0131
or ı
depends on the displaying program.