1

I use function iconv with option translit.

Is there transliteration from UTF-8 to CP1251 when one symbol substitutes with several symbols? Where I can search for that information? I am using iconv.

MayRiv
  • 35
  • 1
  • 5
  • By "symbol" do you mean *character*? – T.J. Crowder Sep 05 '16 at 13:35
  • 1
    Transliteration has nothing to do with encoding. Encoding - is a way how to represent some character (code point) in binary form, while transliteration is a process of replacement characters of one alphabet with characters of another alphabet e.g. cyrillic letters with latin ones. Your question is not clear enough. – Sergio Sep 05 '16 at 13:41
  • For example, according to [link](http://php.net/manual/ru/function.iconv.php) (russian) if i use iconv("UTF-8", "ISO-8859-1//TRANSLIT" to string "€", I get as result string "EUR". But when i try this to convert to CP1251, i get �. And I'm intresting, is there some characters that transliterates in serverel characters in cp1251 – MayRiv Sep 05 '16 at 13:49
  • I think you could probably use [std::codecvt_byname](http://en.cppreference.com/w/cpp/locale/codecvt_byname). – Galik Sep 05 '16 at 15:00
  • 1
    You don't really get �, you get € in CP1251, [which is character number 0x88](https://en.wikipedia.org/wiki/Windows-1251), but your UTF-8 terminal/editor/whatever has no idea how to show it. – n. m. could be an AI Sep 05 '16 at 15:30

2 Answers2

3

There are some, depending on the implementation and locale:

$ echo '℀⇒½' | iconv -f UTF8 -t CP1251//TRANSLIT
a/c=> 1/2 

These are, respectively, U+2100 ACCOUNT OF transliterated as a/c, U+21D2 RIGHTWARDS DOUBLE ARROW transliterated as =>, U+00BDVULGAR FRACTION ONE HALF transliterated as 1/2 (including spaces).

I found these in the GNU libc source code, https://github.com/lattera/glibc/blob/master/locale/C-translit.h.in; different implementations may not transliterate these characters the same way if at all.

ecatmur
  • 152,476
  • 27
  • 293
  • 366
0

The most obvious one is

$ echo 'ß' | iconv -f UTF-8 -t CP1251//TRANSLIT
ss

In addition, if your locale is German, umlauts are transliterated according to German rules (yes transliteration is locale dependent).

$ export LC_ALL=de_DE.UTF-8
$ echo 'Füße' | iconv -f utf-8 -t CP1251//TRANSLIT
Fuesse

(Some versions will print F"usse instead).

n. m. could be an AI
  • 112,515
  • 14
  • 128
  • 243